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ABSTRACT 

Scientific and technical journals in biology and 
medicine in recent years have extensively covered a debate about 
whether and how to determine the function and order of human genes on 
human chromosomes and when to determine the sequence of molecular 
building blocKs that comprise DNA in those chromosomes. In 1987, 
these issues rose to become part of the public agenda. The debate 
involves science, technology, and politics. Congress is responsible 
for **writing the rules'* of what various Federal agencies do and for 
funding their worK. This report surveys the points made so far in the 
debate, focusing on those that most directly influence the policy 
options facing the U.S. Congress. Topics covered in this report 
include: (1) DNA mapping; (2) research applications; (3) ethical and 
social issues; (4) orgemizations and agencies involved in gene 
mapping in the United States; (5) project orgcuiization; (6) efforts 
of other countries; and (7) the transfer of technology. Appendices 
list contract report topics, workshop participants, cost estimates, 
lists of databases, a bibliometric analysis of research, and a 
glossary, (cw) 
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Foreword 



For the past 2 years ; scientific and technical journals in biolo^ and medicine have 
extensively covered a debate about whether and how to determine the function and 
order of human genes on human chromosomes and when to determine the sequence 
of molecular buUding blocks that comprise DNA in those chromosomes. In 1987; these 
issues rose to become part of the public agenda. The debate involves science, technol- 
ogy, and politics. Congress is responsible for "writing the rules" of what various Federal 
agencies do and for funding their work. This report surveys the points made so far 
in the debate, focusing on those that most directly influence the policy options facing 
the U.S. Congress. 

The House Committee on Energy and Commerce requested that OTA undertake 
the project. The House Committee on Science, Space, and Technology, the Senate Com- 
mittee on Labor and Human Resources, and the Senate Committee on Energy and Natu- 
ral Resources also asked OTA to address specific points of concern to them. Congres- 
sional interest focused on several issues: 

• how to assess the rationales for conducting human genome projects, 

• how to fund human genome projects (at what level and through which mech- 
anisms), 

• how to coordinate the scientific and technical programs of the several Federal 
agencies and private interests already supporting various genome projects, and 

• how to strike a balance regarding the impact of genome projects on international 
scientific cooperation and international economic competition in biotechnology. 

OTA prepared this report with the assistance of several hundred experts through- 
out the world. Their help included interviews with OTA staff, comments on drafts of 
the report, and sending information to OTA. We want to thank those reviewers and 
many others who have contributed to making the report more accurate, balanced, and 
useful. 

This report is one of many OTA reports related to biotechnology and genetics. Re- 
cent reports on related topics are Technologias for Detecting Heritable Mutations in 
Human Beings, New Developments in Biotechnology: 1) Ownership of Human Tissues 
and Cells, 2) Public Perceptions of Biotechnology, 4) U.S. Investment in Bioiechnology, 
ai i Human Gene Therapy. 
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Chapter 1 

Summary 



"We want the maximum good per person; but what is good? To one person it is wilder- 
ness, to another it is ski lodges for thousands. To one it is estuaries to nourish ducks 
for'hunters to shoot at; to another it is factory land. Comparing one good with -another 
is, we usually say, impossible because goods are incommensurable. Incommensurables 
cannot be compared. 

Theoretically this may be true; but in real life, incommensurables are commensura- 
ble All that is needed is a criterion of judgment and a system of weighing." 

Garret Hardin, The Tragedy of the Commons," 
Science 162:1243 1248, 1968. 

"Congress is the place where we make impossible choices between apples and oranges. 
We do it every year in preparing the largest budget on the planet." 

Congressional staff member, 1988. 

'All legislative powers granted shall be vested in a Congress of the United States .... 
No money shall be drawn from the Treasury, but in consequence of appropriations 
made by law. ..." 

Article 1, U.S Constitution. 



The mysteries of inheritance are surrendering 
to modern biology. Over a century ago, Austrian 
monk Gregor Mendel demonstrated that the in- 
heritance of traits could be most siir ply explained 
if it were controlled by factors passed from one 
generation to the next. These units of inheritance 
came to be called genes. The complete set of genes 
from an organism is called its genome. Some traits 
are best explained by inheritance of single genes 
(e.g., many genetic diseases, colorblindness), but 
most, including many nongenetic diseases, involve 
combinations of multiple genes with environ- 
mental factors. 

Scientists discovered in the 1940s that genes con- 
sisted of DNA (deoxyribonucleic acid), and in the 
1950s they further elucidated the mechanisms of 
inheritance. In 1953, Watson and Crick described 
the structure of DNA— the double helix— which 
provides at once an explanation of how genetic 
material is inherited and how genes direct cellu- 
lar function. DNA encodes the blueprint for every 
living thing; it is packed into chromosomes which 
can be seen under a light microscope. The genome 
of an organism can thus be defined as the DNA 
comprising its chromosomes. Each human cell has 
46 chromosomes in 23 pairs. One chromosome 
of each pair is inherited from each parent. DNA 



consists of long chains of chemicals called nucleo- 
tide bases. There are four such bases, represented 
most simply as A, C, T, and G. The order of bases 
making up DNA is called its sequence. The DNA 
sequence contains the instructions that specify 
the production of molecules, usually proteins, that 
provide cellular structure and perform biochem- 
ical functions in the cell. 

Our understanding of genetics has advanced 
remarkably in the last three decades as new meth- 
ods of manipulating and analyzing DNA have been 
developed. Recombinant DNA technology enables 
scientists to insert DNA from one organism 
directly into that of another, thereby allowing 
them to study how genes function in relatively 
controlled conditions. New methods to detect and 
purify small amounts of DNA, new techniques to 
handle and analyze DNA that is millions of bases 
long, and novel scientific instruments have aug- 
mented the tools scientists use to understand 
heredity. These powerful and rapidly evolving 
technologies have provoked debate in recent years 
about whether and how to mount a concerted 
research program to map the human genome and 
to determine its DNA sequence. 

To daie, the combined efforts of government 
agencies, university researchers, and private sup- 
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porters of biomedical research have produced 
rough but extremely useful maps of DNA mar- 
kers covering most regions of the human chro- 
mosomes. Chromosomal locations of over 1,215 
human genes are now known (of the 50,000 to 
150,000 estimated to exist), including those caus- 
ing all 20 of the most comnjon genetic diseases. 
Sequencing of DNA from human beings has in- 
cr^ised sharply in rocent years, yet far fewer than 
1 percent of the more than 3 billion bases com- 
prising the human genome have been sequenced 
(see figure 1-1). The function of only a few hun- 
dred human genes is known. Some genetic dis- 
orders are understood at the molecular level (e.g., 
sickle cell disease and Tay -Sachs disease), but th i 
mechanisms underlying most genetic diseases re- 
main unknown. Genetic factors underlying other 
diseases are known only in barest outline. 

The growing power and speed of research in 
molecular biology have led to proposals to apply 
novel molecular biological metl\ods to the genetics 
of entire organisms. Research and technology 
efforts aimed at mapping and sequencing large 
portions or entire genomes are called genome 
projectSi These proposals would build on experi- 
ence already gained from mapping lower organ- 
isms (e.g., yeast, nematodes, and bacteria! .i-d se- 
quencing some virus genomes and regions of other 
organisms, yet they would be more ambitious in 
scale and complexity. More specifically, a public 
debate began in 1985 about the feasibility of map- 
ping, and perhaps sequpncii';;^, the human genome 
and that jf certain otiier organisms. The debate 
has often been cast as an on-olf decision about 
whether there should be a concerted Federal ef- 
fort, yet this is nvers'mplification. The/re are 
many component projects at different stages of 



completion: Systematically making maps of hu- 
man chromosomes is a continuation of ongoing 
efforts, for example. Databases for genetic infor- 
mation and repositories for research materials are 
essential whether or aot there are other special 
efforts. Developing new technologies is widely 
agreed to be important and will require focused 
research r ^ams. The most contentious issue 
is whetht .e DNA sequence of all human chro- 
mosomes should be determined. There is little 
doubt that large regions of human chromosomes 
will be sequenced eventually, but there is vigor- 
ous debate about whether a massive, concerted 
sequencing effort is warranted. This remains an 
open question that is likely to be resolved only 
after pilot projects to determine the sequence of 
other organisms, small human chromosomes, or 
chromosomal regions of special interest have been 
performed. Pilot projects can demonstrate the 
technologies and should alsc determine whether 
dedicated sequencing efforts are efficient and 
scientifically sensible. 

Two scientific advisory groups— onp reporting 
to the Department of Energy noOE) and ti^e other 
convened by the National Rctsearch Council (WKLj 
of the National Academy of Sciences— recom- 
mended augmented funding of $200 million per 
year for genome projects. An Office of Technol- 
ogy Assessment (OTA) workshop attempted to esti- 
mate the costs of major component projects. Pro- 
jections fell into the i ange of $45 to $50 million 
per year initially, increasing to $200 to $250 mil- 
lion per year over 5 years. Funding recommen- 
dations made by the scientific advisory comn *t- 
tee^^ would cover most but not all costs estimated 
by OTA. 



DEBATES ABOUT MAPPING THE HUMAN GENOME 



The debate about mapping the human genome 
can be traced through several phases. Until the 
1960s, techniques for locating human genes were 
rudimentary, and human genetics was based pri- 
marily on analysis of inheritance patterns of dis- 
eases and other observ jhle traits through family 
trees. In the late igeOs and through the 1970s, 
scientists developed the first maps of human 



genes, bas'^d on direct observation of chromo- 
somes. In successful cases, the location of a gene 
could be specified within several million bases of 
DNA. 

In the late 1970s and early 1980s, scientists took 
the "irst steps toward maps of human chromo- 
somes based on direct biochemical analysis of 



Figure M.— Comparative Scale of Mapping 




The number of base pairs of DNA In human cells Is roughly comparable to the number of people on Earth. 
The scale of genetic mapping efforts can be compared to porulation maps, with chromosomes (50 to 250 mil- 
lion base pairs) analogous to nations, and genes (thousands to millions of base paks) to towns. 
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DNA. DNA fragments of unknown function but 
known location were used to study inheritance 
of traits far more precisely than before. Calcula- 
tions suggested that DNA markers, which signify 
the presence or absence of particular stretches 
of DNA, could be identified for regions of all the 
human chromosomes. 

Markers can be used to trace which pieces of 
DNA, and therefore which parts of chromosomes, 
are inherited from which parent. When a genetic 
trait is caused by a single gene and that gene is 
close to a marker, the marker can be used to ascer- 
tain roughly where the gene is located because 
the two are inherited together. 

The U.S. Government and research agencies 
abroad fund most research that uses DNA mar- 
kers to study diseases and physiological functions 
and most university groups searching for new 
markers in chromosomal regions of particular in- 
terest. In the United States, the National Institutes 
of Health (NIH) are the largest funding sources 
for biomedical research on genetics. 

Construction of maps of DNA markers was 
undertaken in the early 1980s. The two largest 
collections of markers were developed by the 
Howard Hughes Medical Institute (HHMI), a pri- 
vate philanthropy, and Collaborative Research, 
Inc., a private corporation. Dozens of university 
researchers and other private firms also contrib- 
uted to this kind of genetic map. 

In 1985, DOE began planning the Human Ge- 
- nome Initiative to develop research tools for 
molecular genetics. Events leading up to the ini- 
tiative included a workshop convened by the 
University of California at Santa Cruz and inter- 
nal planning by DOE administrators. DOE con- 
sidered the initiative an extension of its ongoing 
work in molecular biology— largely focused on de- 
tecting mutations and other biological effects of 
radiation and energy production— that would take 
advantage of research staff and instruments lo- 
cated at the national laboratories, which are 
funded by DOE. DOE held several public meet- 
ings to discuss the technical possibilities. The first 
of these ^/as a workshop held in March 1986 in 
Santa Fe, New Mexico. 



Discussion at that workshop of whether to estab- 
lish a reference sequence for the entire human 
genome touched off a controversy that has per- 
sisted ever since. Arguments about the usefulness 
of extensive sequence information reached a high 
pitch at a conference at Cold Spring Harbor Lab- 
oratories in June 1986. Many scientists perceived 
a major sequencing effort as a threat to the con- 
duct of basic research in molecular biology be- 
cause of its projected cost and potential drain on 
research talent. Estimates of the cost of sequenc- 
ing alone (without accounting for mapping or 
preparation of DNA to be sequenced) ran to bU- 
lions of dollars. Calls for central management of 
such a prodigious undertaking further heightened 
tension because of the strong tradition of decen- 
tralized, small-group research in molecular biol- 
ogy. Debate over the appropriate strategy for 
deciding which regions to sequence first added 
to the din and spilled over into the scientific press. 
Major newspapers and magazines have covered 
the debate since, giving the Human Genome Ini- 
tiative a high public profUe. 

The Cold Spring Harbor discussion was followed 
by a series of meetings held by HHMI, NIH, DOE, 
NRC, OTA, and others. Plans for special research 
initiatives by NIH, DOE, and HHMI have resulted 
from these and other discussions. A few private 
corporations have also been established (or are 
being established) to perform DNA sequencing and 
to develop research resources. 

This report deals with various projects that have 
been proposed by Federal agencies to construct 
maps of human and other chromosomes, to im- 
prove relevant databases and repositories, and to 
improve research methods and instruments. 
There is no single human genome project but 
instead many projects. For 1988, there are spe- 
cific line items in appropriations for DOE and NIH, 
and the bulk of the discussion in this report refers 
to these new research programs. For purposes 
of this report, genome projecta refers to the re- 
search programs of NIH; DO^r and HHMI, as 
well as parallel programs in the private sec- 
tor or other nations. 
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THE FOCUS OF GENOME PROJECTS 



Genome projects have several objectives: 

• to establish; maintain; and enhance databases 
containing information about DNA sequences; 
location of DNA markers and genes; function 
of identified genes ; and other related infor- 
mation; 

• to create maps of human chromosomes con- 
sisting of DNA markers that would permit sci- 
entists to locate genes quickly; 

• to create repositories of research materials; 
including ordered sets of DNA fragments that 
fully represent DNA in the human chro- 
mosomes; 

• to develop new instruments for analyzing 
DNA; 

• to develop new ways to analyze DNA; includ- 
ing biochemical and physical techniques and 
computational methods; 

• to develop similar resources for other organ- 
isms that would facilitate biomedical research; 
and possibly 

• to determine the DNA sequence of a large 
fraction of the human genome and that of 
other organisms. 



Genome projects underway or planned by DOE; 
NIH; the National Science Foundation (NSF); HHMI; 
and other organizations are different but over- 
lapping. They share two features: They would put 
new methods and instruments into the tool kit 
of molecular biology; and they would build a re- 
search infrastructure for geneticists (see table 1-1). 

DOE'S Human Genome Initiative began in late 
1986 and consists of several projects. One is to 
create an ordered set of DNA segments from 
known chromosomal locations; this set; if widely 
available; could save the tedious steps involved 
in isolating DNA for study once a gene's approxi- 
mate location is known. It should also reduce need- 
less duplication of effort by different groups study- 
ing genes in the same chromosomal region. A 
second project is to develop new computational 
methods to enhance analysis of genetic map and 
DNA sequence data. Another project is to develop 
new techniques and instruments for detecting and 
analyzing DNA; including automation and ro- 
botics. For these projects; DOE expended $4.2 mil- 
lion in 1987and plans $12 million for 1988. It also 
planned to support an additional $7 million in 1987 



Table 1-1.— Principal Organizations Involved In Genome Projects 



Organization 



National Institutes of Health 
(Department of Health 
and Human Services) 

Department of Energy 
(Office of Health and 
Environmental Research, 
Office of Energy Research) 

National Science Foundation 
(Directorate of Biological, 
Behavioral, and Social Sciences 

Howard Hughes Medical Institute 



Mission 



Biomedical research 



Biological effects of 
energy production and 
radiation; use of national 
laboratory resources 

Basic scientific research 



Biomedical research 



Funding ($000,0008)' 



Life sciences: 6,170 
Related research: 313 
Genome projects: 17.2 
NLM biotechnology databases: 3.83 
Life sciences: 230 
Related research: 7 
Genome projects: 12 



Life sciences: 206 
Related research: 32.7 
Genome projects: 0.2 

Life sciences: 240 
Genetics: 40 

Genetic marker maps: 2 to 4 
Databases: 2 



•ttimf for fiscal VMr iQfl? "r%«nAm« nfni*r»«<> 



rtsearch" figurtt tft Mtimf *98 for fiscal yetr 1987 
continuing rMdution 



Genome projects figures tre estimates for fiscal year 1986. based on appropriations under the December 1967 

SOURCES, Ji';;^.';^?^^^^ ?S?NSFTaTlT Ic^osEJ^^^^^ ''•""'y D*^'^ S'"'*^' P*^<»"«' communications. June. 

J«i?2!?iSm' Kingsbury, personal communications. June and November 1987, HHMi- George Cehiii. personal communication. 
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for related research and infrastructure. DOE has 
requested $18.5 million for direct support of its 
Human Genome Initiative in fiscal year 1989. 

NIH has supported special genome projects since 
1987, with two objectives, to improve methods 
for analyzing the genome of human beings and 
other complex organisms and to enhance com- 
putational methods. NIH also supports most of the 
relevant databases and repositories. It spent an 
estimated $313 million on projects that involved 
mapping and sequencing in 1937, and several mil- 
lion more on infrastructure. NIH plans somewhat 
higher spending for related research in 1988 and 
will have two items in its budget— an additional 
$17.2 million for genome projects and $3.83 mil- 
fion for increased database support at the National 
Library of Medicine. The fiscal year 1989 budget 
request for the National Institute of General Med- 
ical Sciences of NIH includes $28 million for ge- 
nome projects. 

IMI has two genome initiatives: one to sup- 
port key databases containing information about 
the genetics of human and other organisms, and 
the other to support biomedical research on basic 
genetic mechanisms and genetic disease. HHMI's 
budget estimates from 1987 included $40 million 
for genetics (including $2 to $4 million for genetic 
mapping) and $2 million lo support genome 
databases. 

The NSF plans to increase the number of biol- 
ogy centers it supports, in order to develop new 
scientific instrumentation and encourage sharing 
of expensive equipment. These and other NSF pro- 
grams are not genome projects per se, but they 
are likely to be integrated with programs of other 
agencies in some locations. Instrumentation de- 
veloped through the biology centers will probably 
be directly relevant to genome projects. NSF bud- 
get estimates for 1987 were $206 million for life 
sciences, of which $32.7 million went to research 
related to genome projects and $200,000 went 
directly to genome projects. 

Mechanisms for interagency coordination of ge- 
nome projects have evolved over the past 2 years. 
Initially; there was informal communication 
among DOE, NIH; NSF, and HHMI. The Federal 
agencies then formed a working group under the 



Domestic Policy Council (DPC), a cabinet-level 
group in the White House. A committee to replace 
the DPC group is now being organized by the 
White House Office of Science and Technology 
Policy (OSTP), but its exact composition and func- 
tion have not yet been determined. 

International efforts are concentrated in devel- 
oped nations with strong research traditions. Map- 
ping geneS; both human and nonhuman, has been 
an international effort since its inception. Inter- 
national agreements for databases (particularly 
those containing DNA sequence data) and collabo- 
rations on gene mapping (notably, the Center for 
the Study of Human Polymorphism in Paris) have 
been in operation for several years. No foreign 
government has made a commitment yet to 
mapping and sequencing the human genome; 
although several^ governments support related 
projects through their usual mechanisms of re- 
search funding. The Unite i Kingdom has sup- 
ported one of the pioneering efforts to map the 
genome of a nonhuman organism and additional 
work to develop new mapping and sequencing 
technologies. Italy has the most specific commit- 
ment to the human genome: It funded several pi- 
lot projects (up to $1 million per year for 2 years) 
to map and perhaps sequence at least one small 
human chromosome, with the intent of increas- 
ing that budget rive - to ten-fold if the projects are 
promising. France^ the Federal Republic of Ger- 
many, and other Western European nations have 
substantial commitments to genetics research and 
are also discussing international cooperation. 
Canada's medical research planning board is con- 
sidering special efforts for genome projects. The 
European Molecular Biology Laboratory and Euro- 
pean Molecular Biology Organization have ex- 
pressed interest in an international collaboration 
to map and sequence the genomes of human and 
nonhuman organisms. 

Eastern European and Asian nations have ex- 
pressed interest in using the resulting data, but 
they have relatively limited programs for genetics 
research. Australia is one possible exception; it 
has consistently increased its share of publications 
related to genetics over the last decade, and it 
would logically be included in any international 
planning. Japan is another exception. Its Science 
and Technology Agency has expended $3.8 mil- 



lion lo support automation of UNA sequencing 
technologies; the Ministry of Education supports 
a grants program in genetics, and the Ministry 
of International Trade and Industry has devoted 



several million dollars to study the feasibility of 
an expanded international effort called the Hu- 
man Frontiers Science Program, which could in- 
clude genome research projects. 



MISPLACED CONTROVERSY ABOUT 
"THE HUMAN GENOME PROJECT'' 



Over the past several years, the debate about 
genome projects has been vigorous— sometimes 
acrimonious. Many articles have appeared in the 
scientific press and the general press about "the 
genome controversy." The most conspicuous dis- 
agreements, however, have concentrated on is- 
sues that are not central to the conduct of genome 
projects. Disputes among the executive agencies 
have been played up, belying the generally close 
cooperation among DOE, NIH, HHMI, private 
firms, and other groups in conducting their 
respective projects. International cooperation 
among gene mappers and database managers has 
been successful but has attracted little attention. 
Private corporations are already involved in many 
of the projects that are furthest along. One firm 
has developed an extensive map of human genetic 
markers, and others have developed instrumen- 
tation useful in research relevant to the mapping 
and sequencing of DNA. These companies hav^ 
offered few complaints about barriers to technol- 
ogy transfer. Dissent has focused on the import 
tance of and strategy for sequencing DNA of 
the entire genomC; yet no agency lias made a 
commitment to massive sequencing. The cur- 
rent commitment is to develop technologies that 
would make it faster and less costly and to im- 
prove databases to collect and disseminate the re- 
sulting information. DOE has expressed interest 
in a concerted sequencing program, but only 
when technological development reduces its cost 
to tens of millions of dollars, in several years at 
the earliest. 

Some of the debate can be attributed to the ti- 
tle that has often been applied to genome projects 
—the Human Genome Project. The t jrm is a use- 
ful way to link research initiatives and to distin- 
guish them from ongoing programs for budget 
planning. It highlights the ultimate objective-^ 
understanding human biology by developing a 



new set of research resources-^and captures po- 
litical support and broad public interest. It has 
had the effect, however, of generating rancorous 
debate which has inhibited the dev elopnent of 
consensus on how to improve the research infra- 
structure. The importance of maps, databases, and 
repositories has been obscured by the controversy 
over massive DNA sequencing. 

The title has had several other untoward effects. 
The Human Genome Project centers attention ex- 
clusively on human genetics, but understanding 
human genes will necessarily involve the 
study of other organisms. Many of the re- 
sources—particularly maps of human chromo- 
somes— m// be focused on human beings; but to 
interpret human genetic information, similar re- 
sources must be developed for other organisms. 
New instruments and methods will be applicable 
to all DNA. 

The Human Genome Project invites confusion 
by implying that the human genome will be un- 
derstood when the project is over. The immedi- 
ate goal of genome projects is not complete un- 
derstanding, but creating tools to bring about such 
understanding in the 21st century. Understand- 
ing encompasses all biomedical research; it does 
not distinguish genome projects from others. The 
most ambitious possible goal of genome projects 
would be to complete the most detailed map: a 
reference sequence of the entire human genome. 
Even if this were agreed to and developed, it would 
not yield immediate understanding of how that 
DNA sequence is translated to make a human be- 
ing. It would not explain how nerve cells become 
connected in the immensely complex anatomy of 
the brain. It would not even provide complete an- 
swers to how individuals differ or how they have 
evolved. Sequence data, like other genetic infor- 
mation, is meaningful only when compared among 
individuals and correlated with biological function. 
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There is no single, monolithic Human Genome 
Project. In fact, there are several distinct compo- 
nents at various stages of development. Some in- 
struments and many databases already exist; some 
genetic maps are more than half complete; repos- 
itories for DNA used in research have only been 
organized in the past few years; and other p'^ojects 
are planned but not yet begun. Whether there 
wiU ever be large and expensive research facil- 
ities for component specific genome projects is 
an open question that can be answered only as 
the technologies evolve. 

The Human Genome Project conjures up im- 
Mges of large-scale projects such as the Man- 
hattan Project to build the first atomic bomb; 
the Apollo Project for a manned Moon land- 
tng, the ^pace station, or the superconducting 
supercollider. Genome projects do not belong 
in this category. Component genome projects 
will not require budgets as large as such mega- 
projects ; nor are the technical ends as focused. 
Genome projects must be distinguished from the 
sequencing of the entire human genome, which 
is but a component still in the planning phase. 
There will be no single event such as the Moon 
landing or the space shuttle launching, nor is there 



likely to be construction of a new multi-billion- 
dollar facility such as the superconducting super- 
collider Genome projects do not now require such 
facilities. Some projects may require facilities to 
perform services for mapping or sequencing in 
the future , yet such facilities would not be larger 
than the molecular biology centers already estab- 
lished at a few major research universities. Map- 
ping or sequencing facilities would differ only by 
being devoted to production work rather than 
pure science. The results of genome projects are 
not contingent on completion of large capital- 
intensive dedicated units, and the data and instru- 
ments will be integrated into biology and medi- 
cine as the projects progress. Genome projects are, 
in this respect, analogous to navigational charts 
or road maps, which are useful even as they are 
being updated. Some persons believe a shortage 
of trained scientific and technical personnel in the 
United States could prove troublesome for molecu- 
lar biolpgy, but the genome projects proposed thus 
far are not so large in scale, even in comparison 
to other areas of biology, as to cause shortages 
in otherareas. Genome projects are relatively mod- 
est compared to othe^ large science projects now 
under consideration by the Federal Government. 



THE GORE ISSUE: RESOURCE ALLOCATION FOR 
RESEARCH INFRASTRUCTURE 



Most issues that need to be addressed regard- 
ing genome projects are variations on the prob- 
lem of the commons: how to create and maintain 
resources of use to all. It can be difficult to de- 
velop goods useful to all if each individual has no 
direct incentive to p^y fn^ th^m and only a few 
are adversely affected. 

The core issue concerning genome projects is 
resource aUocation. What priority should be given 
to funding databases, materials repositories, 
genetic map projects, and development of new 
technologies? Should genome projects have prece- 
dence over other projects important to biological 
and biomedical research? These projects will ben- 
efit the entire biomedical research community, 
and ultimately the Nation and the world , but their 
funding must be drawn from the same agencies 
that support basic research. Funding for genome 



projects will thus be taken from agencies that sup- 
port research on neuroscience, cancer, immunol- 
ogy, and many other promising and rapidly mov- 
ing fields. 

The flow of information from molecular biol- 
ogy is overwhelming the resources devoted to han- 
dling it. Federal agencies, HHMI, and other inter- 
ested groups are acting to manage the deluge. 
Research dedicated to improving databases, 
maps; repositories; and research methods 
promises to increase efficiency overall by do- 
ingonce systematically what would otherwise 
be duplicated by many groups using more 
primitive technologies. Whether massive, con- 
certed DNA sequencing is similarly efficient can 
only be demonstrated by trying it on a smaller 
scale. 
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ORGANIZATION OF THIS REPORT 



The following sections describe options for con- 
gressional action. Subsequent chapters address 
the issues raised here in greater detail. Chapter 
2 provides technical background and explains how 
genome projects might be conducted. Chapter 3 
reviews how results might be used in biology and 
medicine. Chapter 4 outlines some long-term so- 
cial and ethical issues surrounding human genome 
projects. Chapter 5 surveys agencies and organi- 



zations in the United States actively supporting 
human genome projects. Chapter 6 discusses how 
genome projects might be organized among these 
agencies and organizations. Chapter 7 briefly sur- 
veys activities in foreign countries; and chapter 
8 presents issues involved in technology transfer. 
Appendixes contain background on material used 
to produce this report, databases, costs of projects, 
and mapping and sequeiicing publications. 



THE ROLE OF CONGRESS 



Genome projects have come to the attention of 
Congress for three reasons. First, they have be- 
come highly visible because of the extensive de- 
bate surrounding them. Second, they involve agen- 
cies in different executive departments; therefore, 
mechanisms for coordinating them are less clear 
than if they were all in a single department. Third, 
results of genome projects will lead to new scien- 
tific and medical instruments for analysis of DNA, 
development of new genetic tests for use in clini- 
cal diagnosis, and other products and services. 
Techniques developed to analyze DNA will expe- 
dite biological research and will provide data and 
technologies crucial to the development of many 
new products. In this sense, genome projects 
promise economic returns, although the form and 



magnitude of them are not predictable. Genome 
projects have thus been linked to international 
competitiveness in biotechnology and its economic 
implications for American commerce in coming 
decades. 

Congress has three roles regarding genome 
projects: 

1. annual appropriations to Federal agencies 
funding the projects; 

2. authorization of actions by executive agen- 
cies to set up formal coordinating structures 
or of specific mandates of agencies; and 

3. oversight of agencies' conduct of their 
projects. 



OPTIONS FOR ACTION BY CONGRESS 



Options for congressional action discussed here 
build on the discussions above and those in chap- 
ters 4 through 6. Background material and de- 
tails can be found in those chapters. 

Appropriations to Federal Agencies 

The pace of federally funded genome projects 
will be determined principally by the annual ap- 
propriations set by Congress and by the execu- 
tive agencies' commitment to the projects. Al- 
though agencies retain some authority to 
"reprogram'' funds for activities that fall within 
their mandates, large efforts cannot be sustained 
without specific appropriations. Appropriations 



will set an upper limit on the size and number 
of projects that are federally supported; commit- 
ment by executive agencies, and their grantees 
and contractors, will determine the speed and 
scope of projects within those limits. 

The critical judgment in appropriations is the 
importance of the work to be supported relative 
to other research and activities supported by the 
Federal Government. The two national scientific 
groups that have written reports on genome 
projects, a DOE advisorv subcommittee and an 
NRC committee; have both recommended substan- 
tial additional funding for genome projects, even- 
tually equaling $200 million per year. OTA inde- 
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pendently projected costs of genome projects at 
a workshop and through subsequent interviews 
and letters . Appendix B summarizes cost estimates, 
including the history of those made by other 
groups, and reviews the process OTA used to 
make its estimates. The cost of funding all com- 
ponent projects was estimated as increasing from 
$47 million the first year to $228 million the fifth 
year. This would permit strengthening of data- 
bases and repositories, construction of several va- 
rieties of chromosomal maps, development of 
many new technologies, and initiation of pilot 
projects for DNA sequencing. 

Access to Information and Materials 

The information produced by genetics research 
has swamped existing management systems. Ma- 
terials to facilitate molecular genetic research have 
also proliferated, straining the resources devoted 
to making them widely available. These manage- 
ment problems will intensify as new technologies 
further accelerate research. Several of the genome 
projects are intended to systematically ai chive in- 
formation; collect and store research materials, 
and make information and materials widely avail- 
able to the research community. Improving data- 
base and repository services is imperative 
whether or not other genome projects pro- 
ceed. If genetic mapping and sequencing initia- 
tives are pursued, then databases and repositories 
will be needed even more. Bills have been intro- 
duced to improve coordination of and access to 
molecular biology databases through the National 
Library of Medicine. Each major repository and 
database has its own advisory panel of outside 
scientists. NIH has appointed an internal commit- 
tee to report on NIH-supported repositories Two 
international meetings were held in 1987 to dis- 
cuss management of databases that contain DNA 
sequence data. NIH and DOE cosponsored a meet- 
ing on databases and repositories in August 1987, 
and appropriations to DOE and NIH have been 
increased to support databases and repositories. 
Congress nas the options of maintaining current 
funding levels or increasing funds for database 
and repository services through the current sys- 
tem of agency planning and congressional over- 
sight. Seeking recommendations from an advisory 
committee on how to integrate the development 



of databases and repositories with genome proj- 
ects is an additional option. 

Organization of Genome Projects 

Congress could pass legislation to organize hu- 
man genome projects— in fact, bills on organiza- 
tion have dominated discussion in Congress. There 
are four principal choices: 1) to designate a single 
agency to coordinate the projects, 2) to establish 
an interagency task force, 3) to establish a national 
consortium; or 4) to rely on congressional over- 
sight of interagency agreement and consultation. 

Establishing an interagency task force through 
legislation or encouraging agencies to do so by 
oversight are the least problematic choices. Des- 
ignating a lead agency would be politically trouble- 
some and would risk interruption of ongoing re- 
search programs at one or more agencies. Devising 
a single national consortium to manage the many 
diverse genome projects is likely to prove imprac- 
tical. See chapter 6 for a more detailed discussion 
of these options. 

Designate a Lead Agency 

Congress could choose to designate a lead 
agency to coordinate and provide principal fund- 
ing for genome projects. The chief advantage of 
a lead agency is accountability through clear au- 
thority. The purpose of focusing authority would 
be to reduce duplication of effort, to enhance co- 
ordination^ and to give Congress a single agency 
on which to concentrate oversight. The chief dis- 
advantage is that the difficult political process of 
selecting a lead agency would delay progress and 
diminish overall funding. If line item funding for 
genome projects at the nonlead agency— NIH or 
DOE— were eliminated, then agreement would 
have to be reached to add funds for the lead 
agency. This is a difficult process because it in- 
volves a completely different set of congressional 
committees and subcommittees for each agency. 
The choice of a lead agency would likely precipi- 
tate a protracted battle among agencies and con- 
gressional committees, which could only serve to 
delay projects. Furthermore, activities of NIH, 
DOE, NSF, HHMI, and other organizations are com- 
plementary rather than competitive and duplica- 
tive. Appointing a lead agency could complicate 
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planning for the other agencies. As an alterna- 
tive, each agency could take the lead in projects 
best suited to its mandate and expertise. This 
would result in a task force or consultative ar- 
rangement, discussed below, rather than a single 
lead agency. Designating a lead agency would at- 
tempt to centralize authority, but it is not clear 
that this would improve efficiency, communica- 
tion, or coordination. 

DesignaUon of a lead agency for genome projects 
could, paradoxically, diminish rather than enhance 
accountabiUty to Congress. This follows from the 
organizaUonal structure of congressional commit- 
tees. Genome projects supported by NIH, DOE, 
and NSF are authorized by several committ^s and 
subcommittees in both the House of Representa- 
tives and the Senate. Currently, each committee 
or subcommittee has independent authority to 
oversee programs in agencies under its jurisdic- 
tion, and interest in human genome projects has 
been high. Designating a lead agency would limit 
most oversight responsibility to a single commit- 
tee. Further, a lead agency could not fullv cen- 
tralize authority, because HHMI is a nongovern- 
ment organization. Picking a lead agency would 
be politically difficult and is unlikely to occur un- 
less there is strong evidence of the advantages 
of centralized authority for Federal efforts. The 
evidence to date is quite to the contrary: Agen- 
cies are communicating, sharing personnel, using 
compatible peer review procedures, and jointlv 
funding projects in overlapping areas. 

Designating a lead agency might eliminate plural- 
ism m Federal funding of genome projects. An 
investigator wishing to pursue a genome project 
can now apply to NIH and DOE, or NIH and NSF 
for funding (depending on the nature of the 
project). If there were a single lead agency con- 
trolling genome projects, the choices would be 
limited, diminishing the pluralistic funding that 
has been a mainstay of American biology. If the 
lead agency had only an administrr.Mve role and 
did not provide the greatest amount ot funds, then 
there would be little point in calling it a lead 
agency. 

Congress sets independent budgets for NSF, NIH, 
and DOE through different subcommittees in the 
House and Senate appropriations committees. 



With several subcommittees involved, projects 
have alternative sources of support in Congress. 
Designating a lead agency would reduce this flex^ 
ibihty. The danger of pluralism is that different 
agencies will duplicate each other's work, will fail 
to cooperate, will fail to identify gaps in research, 
or will receive uncoordinated or inappropriate 
appropriations due to the absence of a clear au- 
thority structure. To date, such funding disarray 
has failed to materialize. There are checks and 
balances in the congressional budget process, 
through the Office of Management and Budget 
(OMB), and through the interagency consultation 
group in OSTP. 

Arguments for a centralized and highly orga- 
nized effort would be stronger if genome projects 
addressed a national health emergency, such as 
AIDS or polio, or if they were aimed at a single 
technical or scientific objective. But genome 
projects are many and diverse. Focused respon- 
sibility may nonetheless become necessary for 
some of them. Mapping, tor example, might be 
more efficiently done at production centers as 
methods mature, and DNA sequencing might re- 
quire dedicated facilities if the technology 
demands high capital investment or central man- 
agement. If dedicated service centers are estab- 
lished, administration by a single agency or for- 
mal interagency agreement would be necessary 
to ensure standardization and efficiency. Such 
services would only be components of overall ge- 
nome projects, however; integration of the vari- 
ous projects would still be needed. 

If genome projects were neglected or inconspic- 
uous elements in agencies' programs, then the ad- 
vantages of central oversight through a single 
agency would carry more weight.^ This has not 
been the case. Genome projects have been given 
high priority -first by DOE and more recently by 
NIH— and there has been extensive media atten- 
tion to agencies' management of them. There is 
thus little danger in the for-seeable future that 
genome projects will receive insufficient attention 
or that mismanagement will escape congressional 
scrutiny. 

The agency most affected by genome projects 
will be the NIH. If Congress finds that the advan- 
tages of a lead agency outweigh the disadvantages. 
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then NIH is the natural choice for lead agency. 
This is because biomedical research is NIH's cen- 
tral mandate; whereas NSF's and DOE's research 
programs include physical as well as life sciences. 
NIH funds over 10 times more genetics research 
than any other government or nongovernment 
organization; and researchers funded by NIH are 
the most numerous of the intended beneficiaries 
of genome projects. Researchers supported by 
DOE; NSF, and other organizations have impor- 
tant contributions to make, however, and some 
projects fall outside the mainstream of research 
supported by NIH. Genome projects that involve 
expertise in physical science, engineering, end 
other fields outside biomedical research would 
benefit from participation in or leadership by NSF 
or DOE. DOE in particular has vigorously pro- 
moted a Federal program to develop new tech- 
nologies and to create sets of ordered DNA frag- 
ments. Some DOE-supported projects are logical 
extensions of work at the national laboratories, 
and DOE is the natural agency to conduct these. 
If NIH were designated the lead agency, it would 
be important to recognize an*^* ^ilan for the ongo- 
ing efforts of DOE. 

Establish an Interagency Task Force 

The chief advantage of an interagency task force 
is that it buUds on existing research programs and 
planning efforts in different agencies and does 
not require a single lead agency. A task force could 
monitor all genome projects, government and non- 
government, obtain scientific advice, foster com- 
munication, and make recommendations to Con- 
gress and the appropriate agencies. Discussion at 
an OTA workshop in August 1987 stressed that 
agencies should have outside scientific advice and 
that advice given to one agency should take into 
account activities supported by other agencies. 
No advisory body exists to carry out this task. The 
chief disadvantage of a task force is that no one 
agency is accountable for the conduct of genome 
projects. 

Creating a task force entaUs decisions about who 
should be represented, how appointments are to 
be made, and where the task force would be lo- 
cated administratively. Legislation could specify 
that it represent government, academic, indus- 
trial, and other relevant expertise and could stipu- 



late the terms of membership and the appoint- 
ment process. The task force could be made part 
of a government agency (making it in effect the 
lead agency), administratively autonomous, or at- 
tached to an existing quasi-governmental institu- 
tion such as the National Academy of Sciences. 
Several bUls to establish such coordination and 
advisory groups have been introduced in the 100th 
Congress and are likely to be acted upon in 1988. 

Create a National Consortium 

A consortium would involve one or more agen- 
cies in concert with private partners to support 
genome projects . The chief advantages of a con- 
sortium are administrative flexibility, possible 
funding by private firms to reduce government 
funding, and direct involvement of industrial 
partners— which would presumably hasten tech- 
nology transfer. Some potential disadvantages are 
unclear lines of authority (caused by competing 
needs of government and private partners) and 
statementc^ by the private sector that genome 
projects should be funded exclusively by the Fed- 
eral Government (e.g., a poll taken by tlie Indus- 
trial Biotechnology Association). Accountability 
would be complicated in two respects. First, there 
are many genome projects, and it is difficult to 
imagine a single consortium that could oversee 
them all. Second, the possible commingling of gov- 
ernment and nongovernment funds could prove 
troublesome. Consortia might nonetheless be 
formed for specific tasks. Some genome projects 
in technology development will undoubtedly be 
of great interest to industry and might attract pri- 
vate funding. Such projects (e.g., developing auto- 
mated DNA mapping instruments or DNA detec- 
tion methods) are likely to be highly focused, 
however, and organized at the local rather than 
the national level. Accountability would not be 
as diffuse for local consorlia focused on specific 
technical objectives as for a single national con- 
sortium with multiple objectives and dozens of 
projects to manage. 

The Technology Transfer Act of 1986 (Public 
Law 99-502) grants government agencies author- 
ity to form consortia with private corporations 
and provides guidelines for doing so. President 
Reagan's Executive Order 12591 (April 1987) fur- 
ther extends this authority and encourages fed- 
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erally owned laboratories to form consortia. Agen- 
cies thus have the requisite authority already. If 
Congress finds terms of the 1986 bill inappropri- 
ate in some details— for example, regarding pat- 
ent policies or royalty arrangements— then the 
statute could be amended or sf>ecial measures re- 
lating to genome projects could be added as 
amendments to other bills. 

One bill introduced early in the 100th Congress 
would have established a national consortium spe- 
cifically to manage genome projects, but the bill 
has since been replaced by one that establishes 
a new advisory body (covered above as a task 
force). A national consortium is not the only, and 
perhaps not the most effective, way to obtain in- 
dustrial input for genome projects and to facili- 
tate techno'ogy transfer. Alternatives are to en- 
courage agencies to participate in the formation 
of local consortia; lo facilitate exchange of indus- 
trial and academic expertise through training ex- 
change programs, symposiums, and other mech- 
anisms; and to include industrial representation 
on any national advisory groups. 

Rely on Ckingressional Oversight 

If Congress takes no explicit action, several out- 
comes are possible. Federal agencies could con- 
tinue planning processes similar to those followed 
in 1986 and 1987, consisting of informal commu- 
nication and coordination through an interagency 
group with members from NIH, DOE, NSF, OSTP, 
OMB, and other agencies. To date, NIH, DOE, and 
NSF have sought outside advice from various 
standing advisory committees, a practice that has 
resulted in conflicting recommendations. This 
problem could be remedied without legislation: 
The agencies could establish a single interagency 
advisory committee of outside experts appointed 
by the agencies or by a third party, such as the 
National Academy of Sciences or a private philan- 
thropy. The advisory committee could report to 
the agencies directly. 

A Committee on Life Sciences is forming in 
OSTP. The interagency nature and conspicuous- 
ness of genome projects make them a natural topic 
for this committee. OSTP is considering the crea- 
tion of a special subcommittee on genome projects. 



Whether OSTP's efforts meet the objectives 
desired by Congress will depend on effective co- 
ordination and an appropriate balance among gov- 
ernment, university, industrial, philanthropic, le- 
gal, bioethical, and other representatives on the 
subcommittee. If OSTP's subcommittee is com- 
posed exclusively of government representatives, 
then its primary function will be interagency com- 
munication. The main stumbling block to inter- 
agency planning to date has been conflicting ad- 
vice from outside advisory bodies, not lack of 
interagency communication. Pluralism in fund- 
ing is usually a virtue, but making conflicting rec- 
ommendations to different agencies is not. Any 
national coordinating group should take a global 
view of activities in all agencies and harmonize 
the advice given them. 

The chief advantage of relying solely on con- 
gressional oversight is that it requires no new leg- 
islation. Oiie disadvantage is that interagency 
agreement on appointments and operating 
budgets for a coordinating body might prove dif- 
ficult without a congressional mandate and might 
not initially include an appropriate range of non- 
government experts. Another potential disadvan- 
tage is that initiatives undertaken by an adminis- 
tration in the absence of legislation could crumble 
under the weight of later interagency disagree- 
ments or neglect by a subsequent administration. 
Flexibility is beneficial if projects are short-lived, 
but genome projects are not. Long-term stability 
is essential to the efficient conduct of genome 
projects because they will require sustained sup- 
port over many years. Oversight of agency action 
could nonetheless be all that is required. Deficien- 
cies of a task force set up by agencies could later 
be modified indirectly through congressional over- 
sight or threat of legislation. 

Technology Transfer 

Congress appropriates funds to support scien- 
tific research for several reasons, the principal 
one for biomedical research being to improve 
health. Increasingly, however, biomedical research 
is being regarded as a national investment, and 
policies to facilitate economically fruitful appli- 
cations of new knowledge are receiving attention 
in Congress. The process of exploiting new knowl- 
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edge f Qr practical purposes is called technology 
transfeA . Some persons favor increased funding 
for genome projects because they believe the 
projects will lead to marketable products (instru- 
ments, research materials) or will accelerate re- 
search in areas that will later yield marketable 
products. Technology transfer can be improved 
through patent policies, exchange of industrial and 
academic personnel, symposiums for industrial 
and academic scientists, formation of consortia 
to develop specific technologies or services, and 
engaging industry in planning genome projects. 
Programs for exchanging personnel and spon- 
soring symposiums will fall to agencies through 
normal policy paths and can be monitored by Con- 
gress. Consortium formation and industry rep- 
resentation on planning bodies have been dis- 
cussed above. The remaining policy area is patent 
and copyright law. 

Patent policies of Federal agencies have changed 
dramatically during the past decade. The Patent 
and Trademark Amendments of 1980 (Public Law 
96-517), as amended in 1984 (Public Law 98-620), 
were devised to facilitate commercialization of fed- 
eraUy sponsored research. President Reagan is- 
sued directives to Federal agencies in February 
1983 and April 1987 to this same end. And Con- 
gress passed the Technology Transfer Act of 1986 
(Public Law 99-502), which contains patent licens- 
ing and joint venture provisions with authority 
to form consortia with private interests. These 
patent policies, following outlines of policies pi- 
oneered by NIH and NSF in the late 1970s, en- 
courage institutions receiving Federal grants or 
contracts to patent products and processes result- 
ing from federally funded work. A 1987 General 
Accounting Office report judged that the policies 
have increased patenting of research results. 

Aside from a possible change regarding DOE 
policies (see ch. 8), genome projects raise no new 
questions of patent or copyright law. Genome 
projects would be subject to the same statutes and 
executive orders as other scientific efforts . There 
is a clear role for congressional oversight, how- 
ever, in ensuring that data are shared promptly 
and fully. 

In mid-1987, proposals to form private corpo- 
ratior\s to map and sequence the human genome 



stirred a controversy. Scientists expressed con- 
cern that scientific exchange would be impeded 
by such efforts and that information would be 
sequestered through copyrights and patents. If 
private corporations do form to develop map and 
sequence data and research materials, they will 
operate at private expense. If they are success- 
ful, scientists will have new information, services, 
and materials available for a price. If they fail, sci- 
entists should be no worse off, unless the gov- 
ernment fails to support work it would otherwise 
have funded. To date, government agencies have 
not dropped plans for genome projects because 
of corporate efforts. 

Corporate efforts need not entail restricted ac- 
cess to information. Corporations can provide 
services not appropriately performed by labora- 
tories conducting basic scientific research (e.g., 
mapping, sequencing, or database management). 
Universities and large corporations can manage 
research facilities, such as the national labora- 
tories, under contract. Corporations could also 
participate in consortia focused on specific tech- 
nical objectives. Private firms could be given grants 
to develop new methods under the Small Busi- 
ness Innovation Research program; they would 
retain title to inventions, but they would ha - ihe 
same obligation to share data and materials as 
universities or other grantees. The essential point 
is not whether a grantee or a contractor is a 
university or corporation, but whether the re- 
search results will be widely shared. 

It is essential to ensure timely exchange of data 
and materials from federally sponsored projects. 
Maps, databases, and repositories will be useful 
only if they are accurate and complete; they will 
be complete only if all participants make prompt 
contributions. In most cases, patent requirements 
should not substantially delay disclosure of data. 
Many data will not be relevant to a patentable in- 
vention. When research results do include a pat- 
entable invention, advance planning for filing pat- 
ent applications should minimize delays. The main 
option for Congress in this area is to oversee the 
conduct of genome projects. Changes in agency 
policies for data exchange could be made if prob- 
lems emerge. 
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Congress could also direct agencies to make it 
easier for persons recei\ ing Federal grants or con- 
tracts to understand patent policies in the United 
States and abroad. At present, many of the pub- 
lished guidelines and regulations for NIH, DOE, 
and NSF are out of date. Investigators contemplat- 
ing genome projects will probably contact more 
than one Federal agency for research support: 
it would be helpful to have a document siinimariz- 
ing the practices of the different agencies. Such 
a document could also explain the benefits of fil- 
ing patents early and outline procedures for 
patenting abroad 

Questions for Congressional 
Oversight 

Congressional oversight will most often involve 
an informal exchange among congressional staff, 
executive agency personnel, and other interested 
parties. Oversight can be a potent incentive for 



cooperation among agencies and for good con- 
duct of executive actions. Congress may wish to 
hold hearings from time to time to address such 
questions as: Are genome projects being efficiently 
administered? Are agencies duplicating efforts on 
genome projects? Are agencies communicating ef- 
fectively? Are agencies ensuring that access to 
shared data is relatively easy and fair? Are data- 
bases receiving the information they need to be 
most useful (e.g., map and sequence data)? Are 
commercial opportunities being exploited? Are 
shared research i esources being neglected? Are 
issues of special interest to Congress, such as so- 
cial and ethical implications of genome projects, 
being adequately addressed? Do genome projects 
supported by Federal agencies reflect national 
needs and social priorities? Are foreign govern- 
ments funding a proportionate share of genetics 
research and the research infrastructure? Are for- 
eign governments sharing data and materials tr 
the same degree as U.S. agencies? 
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ORGANIZATION AND FUNCTION OF GENETIC INFORMATION 



What la a Genome? 

The fundamental physical and functional unit 
of heredity is the gene. Genetics is the study of 
the patterns of inheritance of specific traits. The 
chemical bearer of genetic information is deoxy- 
ribonucleic acid (DNA), The DNA of multicellular 
organisms such as insects, animals, and human 
beings is associated with protein in highly con- 
densed microscopic bodies called chromosomes, 
A single set, or haploid number, of chromosomes 
is present in the egg and sperm cellt of animals 
and in the pollen cells of plants. All body cells, 
or somatic cells, carry a double set, or diploid num- 
ber, of chromosomes ^ne originating from each 
parental set. The entire complement of genetic 
material in the set of chromosomes of a par- 
ticular organism is defined as its genome. 

How Are Genomes Organized? 

Long before genetic material tvas identified as 
DNA, maps of genes on chromosomes were con- 
structed, and many of the details of transmission 
of genes from generation to generation were elu- 
cidated [Judson, see app. A]. The gene for color- 
blindness, for example, was assigned to the hu- 
man X chromosome in 191 1 (80), about 40 years 
before the discovery of the structure rf DNA. In 
fact, it has been known for nearly a century that 
the genetic material: 

• has a structure that is maintained in stable 
form, 

• is able to serve as a model for replicas of itself, 

• has an information code that can be ex- 
pressed, and 

• is capable of change or variation. 

Each of these features can be described in molecu- 
lar terms based on the structure and function of 
DNA. 

To know how DNA controls cell function, and 
ultimately the structure and function of an en- 
tire organism, it is necessary to understand its 



structure. In multicellular organisms, DNA is gen- 
erally found as two linear strands wrapped around 
each other in the form of a double helix. A DNA 
strand is a polymeric chain made of nucleotides, 
each consisting of a nitrogenous base, a deoxyri- 
bose sugar, and a phosphate molecule (figure 2- 
1). The arrangement of nucleotides along the DNA 
backbone is called the DNA sequence. There are 
four nucleotides used in DNA sequences: adeno- 
sine (A), guanosine (G), cytidine (C), and thymi- 
dine (T). The two strands of DNA in the helix are 
held together by weak bonds between base pairs 
of nucleotides. In nature, base pairs form only 
between A's and T's and between G's and C's. The 
size of a genome is generally given as its total 
number of b^se pairs. 

A full genome of DNA is regenerated each time 
a cell undergoes division to yield two daughter 
cells. During cell division, the DNA double helix 
unwinds, the weak bonds between base pairs 
break, and the DNA strands separate. Free nucleo- 
tides are then matched up with their complemen- 
tary bases on each of the separated chains, and 
two new complementary chains are made (figure 
2-2). In human and other higher organisms, DNA 
replication occurs in the nucleus of the cell. This 
DNA replication process was first proposed in 
1953 by Francis H.C. Crick and James D. Watson 
(19;73,74). 



What la the Genetic Code? 

Most genes carry an information code that speci- 
fies how to build proteins. Proteins are an essen- 
tial class of large molecules that function in the 
formation and repair of an organism's cells and 
tissues. Proteins can be components of essential 
structures within cells, or they can carry out more 
active roles in the overall function of a particular 
cell type. Included in this important class of 
molecules are hormones such as insulin, antibod- 
ies to fight cellular inflections, and receptors on 
the cell's surface for modulating interactions be- 
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tween a particular tissue and its surroundings (68). 
Enzymes are a specialized group of proteins that 



Figure 2-1.— The Structure of DNA 
P.iosphate Molecule 



Deoxyribose 
Sugar Molecule 




Nitrogenous 
P> Bases — • 



4i 



Base Pairs 



- The Sugar-Phosphate - 
Backbone 

The four nitrogenous bases, adenine (A), guanine (G), cyto- 
sine (C), and thymine (T), form the four letters In the alphabet 
of the genetic code. The pairing of the four bases is A with 
T and Q with C. The sequence of the bases along the sugar- 
phosphate backbone encodes the genetic information. 

SOURCE: Offlc« of Ttchnology AMtstmtnt, 1968. 



increase the rate of the biochemical processes that 
take place in metabolism. 

Proteins are long chains of smaller molecules, 
called amino acids, that fold into the unique struc- 
tures necessary for protein function. The infor- 
mation for generating proteins of specific amino 



Figure 2-2.— Replication of DNA 




When DNA replicates, the original strands unwind and serve 
as templates for the building of new, complementary strands. 
The daughter molecules are exact copies of the parent, each 
daughter having one of the parent strands. 

SOURCE- Office of Technology Asstssment, 1968 
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acid sequences is found in itxe genetic code— a code 
based on sequences of nucleotides that are "read" 
in groups of three (table 2-1). Genetic informa- 
tion is tr;»r.3mitted from DNA sequences to pro- 
tein via another large molecule called messenger 



ribonucleic acid (mRNA). The structure of 
ribonucleic acid (RNA) is very similar to that of 
DNA. Figure 2-3 illustrates the major steps in^ene 
expression, namely: 



Table 2-1.— The Genetic Code 



Codon 


Amino Acid 


Codon 


Amino Acid 


Codon 


Amino Acid 


Codon 


Amino Acid 


UUU 


PtienyiaUnine 


UCU 


Senne 


UAU 


Tyrosine 


UGU 


Cysteine 


UUC 


Ptienylalanne 


UCC 


Senne 


UAC 


Tyrosine 


UGC 


Cysteine 


UUA 


Leucine 


UCA 


Serine 


UAA 


stop 


UGA 


stop 


UUG 


Leucine 


UCG 


Serine 


UAG 


stop 


UGG 


Tryptophan 


:uu 


Leucine 


ecu 


Proline 


CAU 


Histidine 


CGU 


Arginine 


cue 


Leucine 


CCC 


Proline 


CAC 


Histidine 


CGC 


Arginine 


CUA 


Leucine 


CCA 


Proline 


CAA 


Glutamine 


CGA 


Arginine 


CUG 


Leucine 


CCG 


Proline 


CAG 


Giutamine 


CGG 


Arginine 


AUU 


Isoleucine 


ACU 


Threonine 


AAU 


Asparagine 


AGU 


Serine 


AUC 


isolaucine 


ACC 


Threonine 


AAC 


Asparagine 


AGC 


Serine 


AUA 


Isoleucine 


ACA 


Threonine 


AAA 


Lysine 


AGA 


Arginine 


AUG 


Methionine 


ACG 


Threonine 


AAG 


Lysine 


AGG 


Arginine 




(start) 














GUU 


Valine 


GCU 


Valine 


GAU 


Aspartic acid 


GGU 


Glycine 


GUC 


Valine 


GCC 


Alanine 


GAC 


Aspartic acid 


GGC 


Glycine 


GUA 


Valine 


GCA 


Alanine 


GAA 


Glutamic acid 


GGA 


Glycine 


GUG 


Valine 


GCG 


Alanine 


GAG 


Glutamic acid 


GGG 


Glycine 



Each codon, or triplet of nucleotides In 
RNA, codes for an amino acid. Twenty 
different amino acids are produced from 
a total of 64 different RNA codOns, but 
some amino acids are specified by more 
than one codon (e.g., phenylalanine Is 
' specified by UUU and by UUC). In addition, 
one codon (AUG) specifies the start of a 
protein, and three codons (UAA, U AO, and 
UGA) specify the termination of a protein. 
Mutations In the nucleotide sequence can 
Change the resulting protein structure If 
the mutation alters the amino acid speci- 
fied by a codon or If It alters the reading 
frame by deleting or adding a nucleotide. 

U- uracil (thymine) A -adenine 
C-cyt08lne G -guanine 



SOURCES: Offlct of Technology ASMtsment and National Institute of General Medical Sciences, 1986 

Figure 2-3.— Gene Expression 



' 

^ * 



tRNA Bringing 
Amino Acid to 
RIbosome ^ 




In the first step of gene expression, messenger RNA (mRNA) Is synthesized, or transcribed, from genes by a process somewhat 
similar to DNA replication. In higher organisms, this process takes place in the nucleus of a celt. In response to certain signals 
(e.g., association with a particular protein), sequences of DNA adjacent to, or sometimes within, genes control the synthesis 
of mRNA. Protein synthesis, ortranslation, Is the second major step in gene expression. Messenger RNA molecules are known 
as such because they carry messages specific to each of tl e 20 different amino acids that make up proteins. Once synthesized, 
mRNAs leave the nucleus of the cell and go to another cellular compartment, the cytoplasm, where their messages are trans- 
lated into the chains of amino acids that make up proteins. A single amino acid is coded by a sequence of three nucleotides 
in the mRNA, called a codon. The main component of the translation machinery is the ribosome— a structure composed of 
proteins and another class of RNAs, rlbosomai RNAs. The ribosome reads the genetic code of the mRNA, while a third kind 
of RNA molecule, transfer RNA (tRNA), mediates protein synthesis by bringing aminO' acids to the ribosome for attachment 
to the growing amino acid chain. Transfer RNAs have three nucleotide bases that are complementary to the codons in the 
mRNA (see table 2-1). 



SOURCE: Office of Technology Assessment, 1968 
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• transcription of DNA into mRNA; and 

• translation of mRNA into protein. 

By these processes, the genetic code directs amino 
acids to be joined together in the order specified 
by the sequence of nucleotides in the messenger 
RNA, which was in turn determined by the se- 
quence of nucleotides in the DNA. 

In molecular terms, a gene is a region of a 
chromosome whose DNA sequence can he 
transcribed to produce a hiologically active 
UNA molecule. Messenger RNAs constitute the 
major class of biologically active RNAs. Other RNAs 
may act as lattices to stabilize certain cell struc- 
tures or may participate directly in important cel- 
lular processes such as protein synthesis. 

How Big Is the Human Genome? 

The diploid human genome consists of 46 chro- 
mosomes- -22 pairs of autosomes and 1 pair of 
sex chromosomes (two X chromosomes for fe- 
males and an X and a Y chromosome for males). 
A single egg ceU has 22 different autosomes and 
a single X chromosome, whereas sperm cells carry 
22 different autosomes and either an X or a Y chro- 
mosome. 

Scientists estimate the total number of human 
genes per haploid genome at 50,000 to 100,000. 
The characterization of the structure of human 
genes on chromosomes was made possible re- 
» Gently through recombinant DNA technology (the 
use of molecular biology tools tt combine DNA 
from one organism with that of another). It is now 
luiown that human genes can vary in size from 
fewer than 10,000 base pairs to more than 2 mil- 
lion. The entire haploid genome is approximately 
3 billion base pairs. So far, researchers are far 
from having determined where each human gene 
is located on the 24 chromosomes. Victor McKu- 
siclc of The Johns Hopkins University maintains 
Mendelian Inheritance in Man, an encyclopedia 
of e'cpressed genes [see app. D]. According to the 
October 1, 1987, count, 4,257 genes were repre- 
sented in the encyclopedia; of those, at least 1,200 
had been mapped to specific chromosomes or re- 
gions of chromosomes (51). Figure 2-4 illustrates 
the years of effort invested thus far in identify- 
ing even this small fraction of the total number 
of human genes. 



How Does the Human Genome 
Compare to Other (Gnomes? 

Before much was known about the DNA se- 
quences that make up genomes, it was thought 
that the amount of DNA per haploid genome 
would increase in proportion to the biological com- 
plexity of the organism. Since chromosomes can 
vary in size, the total amount of DNA in a haploid 
cell is a better indicator of actual genome size than 
the number of chromosomes. Table 2-2 shows that 
higher plants and animals do have much more 
DNA than lower organisms. There are some nota- 
ble exceptions, however, to the correlation be- 
tween overall genome size and complexity of the 
organism. A good example is the salamander, 
which has a haploid DNA content more than 30 
times greater than that of humans, even though 
it is obviously a smaller, less complex organism. 
Similarly, the cells of some species of plants have 
a greater DNA content than human cells (72). 

This inc'>nsistency between DNA content and 
the apparent complexity of an organism is known 
to geneticists as the C value paradox (C-value refers 
to the haploid genome size). A great deal of re- 
search has been devoted to determining the sci- 
entific basis for the C-value paradox. Variations 

Figure 2-4.— Number of Human Gene Loci Identified 
From 1958 to 1987 




1000 2000 3000 4000 5000 



SOURCE Victor McKusIck, The Johns Hopkins University Medical School, Belt!* 
more. MD. 
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Table 2-2.— Haploid Amounts of DNA in 
Various Organisms 



Organism 



Number of base pairs 
(millions) 



Bacterium 

Yeast 

Nematode 

Fruit fly 

Chicken 

Human 



4.7 
15 
80 
155 
1,000 
2,800 



Mouse : 3,000 

Corn 15,000 

Salamander 90,000 

Lily 90,000 

SOURCES 

B. Albofts, D Bray, J Lewis, et al , Molecular Biology of the Ce//(New York NY 
Gartand PubMsh'ng. 1963) 

C. BurKs.GenBank*. Los Alamos National Laboratory. Los Alamos, NM, personal 
communication, March 1968 

T Cavalier Smith (ad ), The Evolution of Genome Size (New York. NY Wiley & 
Sons, 1985) 

J DameJI, H Lodish. and D Baltimore, Molecular Cell Biology (New York, NY 
Scientific American, 1986) 



in genome size usually arise from increases in the 
amount of DNA per chromosome, not from in- 
creases in numbers of chromosomes. The ge- 
nomes of all higher organisms contain sequences 
of DNA that occur as large numbers of repeated 
units ; either clustered in one chromosomal region 
or in regions dispersed throughout the entire ge- 
nome. These repeated sequences contribute to 
wide variations in total DNA content among what 
are often closely related species. 

In large genomes such as the human genome, 
intron sequences also contribute to size. Introns 
are DNA sequences occurring within the coding 
region of a gene. They are transcribed into mRNA, 
but are cut (spliced) out of the message before 
it is translated into protein. Introns can increase 
the number of base pairs in a gene by more than 
tenfold. Many genes also have long regions at their 
ends that are transcribed into mRNA but are not 
translated into protein. In addition, some protein- 
coding genes have given rise to gene families that 
make several closely related protein products. 
Other gene families consist of hundreds or thou- 
sands of closely related genes (72). 

The untranslated sequences within or at the 
ends of genes, gene families, and moderately or 
frequently repeated DNA sequences between 
genes still do not account for all of the DNA in 
the genomes of higher organisms, nor for the var- 
iations in genome* size among these organisms. 



Many scientists interpret these facts to mean that 
some fraction of DNA in the human genome is 
expendable; although there is little agreement on 
the size of this fraction, some believe it to be more 
than 90 percent of the genome (27,54). The impli- 
cation of the C-value paradox, that much of hu- 
man DNA is expendable, is one reason that some 
esteemed scientist., do not favor a major effort 
to obtain a complete nucleotide sequence of the 
human genome. They believe time would be bet- 
ter spent identifying and understanding the func- 
tion of gene products that contribute to the cellu- 
lar processes leading to the development of an 
organism as complex as m"-ii (1). On the other 
hand, some scientists consider the C-value para- 
dox to be one of the many mysteries that might 
be unraveled once entire genomes have been ana- 
lyzed in greater detail. 

Why Does Hereditary Information 
Change? 

Hereditary variation is the result of changes 
occurring by mutation--^ change in the sequence 
or number of nucleotides— which occurs during 
DNA replication. Mutations formed in sex cells 
are inherited by offspring, v^^hereas those that oc- 
cur in somatic cells remain only in the affected 
organism. Some diseases, such as certain human 
cancers, aris3 from factors in both of these cate- 
gories. Mutations are also acquired by artificial 
means, such as exposure to chemicals or certain 
forms of radiation.^ Such factors can cause a 
change in a single DNA base pair that may mod- 
ify or inactivate a protein, if one is encoded in 
that region of the chromosome. 

More extreme mutation*., involving changes in 
the structure of a single chromosome or changes 
in chromosome number, can occur; for example: 

• deletion of a chromosome, 

• duplication of a chromosome or a piece of 
a chromosome, 



'A 1986 OTA report, Technologies for Detecting Heritable Muta- 
tions in Human Beings, addresses the kinds and effects of muta- 
tions in human beings and new technologies for delecting muta- 
tions and measuring mutation rates 
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Figure 2*5.— Separation of Linked Genes by Crossing 
Over of Chromosomes During Meiosis 



fl -a 



w 

Homologous chromosomts com* tog«th«r 
bi pairs b«for« hapioid s«x c«lls ar« 
formed In malosls. 



Each chromosoma in tha pair dupiicatas 
itsaif. 





Chromosomas form synapsas. 




Crossing ovar upon braaking and 
rajoinkig of chromosomas. 





Chromosomas with naw gana combinations 
aftar crossing ovar. 

SOURCE' Office of Technology Assessment, 1988 



• translocation, or insertion of a chromosome 
fragment from one chromosome pair into an 
unmatched member of a different pair and 

• inversion, or the breakage of a chromosome 
fragment followed by its rejoining in the op- 
posite orientation. 

In diploid cells ; there is a tendency for each DNA 
molecule to undergo some form of modification 
or rearrangement with each cell division. The pro- 
genitors of sex cells are a special class of diploid 
cells that undergo two rounds of cell duplication 
in a process called meiosis. Meiosis results in four 
instead of two daughter cells, each with a hapioid 
set of chromosomes. Before the first meiotic cell 
division, each member of a chromosome pair is 
replicated, forming two sets of chromosome pairs. 
At this stage, the cell has two identical copies of 
chromosomes of maternal origin and two identi- 
cal copies of chromosomes of paternal origin. Also 
at this time, the chromosome pair of maternal ori- 
gin is in close association with that of paternal 
origin, and an event called crossing over can oc 
cur; that is, one maternal and one paternal chro- 
mosome can break, exchange corresponding sec- 
tions of DNA, and then rejoin (figure 2-5). (This 
process is also refer \ to as recombination.) In 
this way, two of the four resulting sex cells have 
chromosomes with new combinations of genes, 
while the other two cells carry the parental (origi- 
nal) combinations of genes. Since chromosomes 
originating maternally or paternally can carry 
different forms of any given gene, new combina- 
tions of traits are created by such crossovers. 



GENETIC LINKAGE MAPS 



Because of recombination during meiosis, cer- 
tain groups of traits originating on one chromo- 
t\ome are not always inherited together (figure 
2-5). Thf. closer, or more linked, genes are on a 
particular chromosome, the smaller the probabil- 
ity that they will be separated during meiosis. Each 
chromosome is inherited independently of all 



others, so only genes on the same chromosome 
can be linked. 

Gene mapping, broadly defined, is the 
signment of genes to chromosomes. A genetic 
linkage map permits investigators to ascertain 
one genetic locus relative to another on the ba- 
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sis of how often they Fi«e inherited together. 

Strictly speaking, a genetic locus is an identifia- 
ble region, or marker, on a chromosome. The 
marker can be an expressed region of DNA (a gene) 
or some segment of DNA that has no known cod- 
ing function but whose pattern of inheritance can 
be determined. Variation at genetic loci is essen- 
tial to genetic linkage mapping. Tlie markers that 
serve to identify chromosome locations must 
vary in order to be useful for linkage studies 
In families^ because only when the parents 
have different forms at the marker locus can 
linkage to a gene be followed in theh- chUdren. 
Alleles are the alternative forms of a particular 
genetic locus. For example, at the locus for eye 
color, there are blue and brown alleles. During 
meiosis, aU of the genetic loci on a chromosome 
remain together unless they are separated by 
crossing over between chromosome pairs. 

Distance on genetic maps is measured by how 
often a particular genetic locus is inherited 
separately from some marker. This measure of 
genetic distance is called recombination fre- 
quency. The amount of recombination is ex- 
pressed in units called centimorgans. One cen- 
timorgan is equal to a 1 percent chance of a genetic 
locus being separated from a marker due to re- 
combination in a single generation. 

During the generation of sex cells in human be- 
ings, if a gene and a DNA marker are separated 
by recombination in 1 percent of the cases stud- 
ied, then they are, on average, separated by 1 mil- 
lion base pairs. The relationship between genetic 
map distance (recombination frequencies) and 
physical map distance (measured in DNA base 
pairs) can vary, however, by five- or even tenfold. 
Recombination can vary from near zero, if genetic 
loci are very close, to 50 percent, between genetic 
loci that are far apart on the chromosome or on 
different chromosomes. Some chromosome re- 
gions are highly prone to recombination and ex- 
hibit high recombination frequencies, while other 
chromosome regions appear to be resistant to 
recombination. Interestingly, the rate of recom- 
bination in the same region of a particular chro- 
mosome typically varies among males and femi les, 
and it is often greater in females. The reasons for 
this have not been established. Double or multi- 
ple crossover events can also occur between two 



loci that are widely separated. Each of these vari- 
ations in recombination frequencies complicates 
the relationship between genetic and physical 
maps. Nevertheless, if a genetic linkage map were 
constructed with a set of markers separated by 
an average of 1 centimorgan, then most genes 
could be located within a range of 100,000 to 10 
million base pairs. 

Genetic linkage between two or more observa- 
ble traits can be established with greater certainty 
in large populations. For this reason, large fam- 
ilies are preferred for mapping studies. If two 
genetic loci are closely linked, then their separa- 
tion by recombination during meiosis is unlikely 
and a large family must be studied to determine 
how close they are on the genetic map. As more 
loci are placed on the genetic map, it becomes pos- 
sible to determine the location of a new trait on 
the basis of its inheritance pattern compared to 
two or three others already on the map. The fre- 
quency with which multiple traits are inherited 
together generally must be calculated for many 
individuals over many generations before genetic 
mapping results are statistically significant. 

The X chromosome is particularly amenable to 
linkage analysis because male traits directly re- 
flect the genes on the single X chromosome 
present. For this reason, the genetic linkage map 
of the X chromosome is the most nearly complete 
of all chromosome maps. 

Mapping of genetic loci on autosomes, on the 
other hand, is not as easy, unless the gene is found 
to be linked to a marker that has already been 
mapped through the study of family inheritance 
patterns. The first assignment of a gene to a spe- 
cific autosomal chromosome came in 1968, when 
researchers showed that the Duffy blood group, 
which can be identified in families by biochemi- 
cal methods, is linked to a variation in chromo- 
some 1 (23). About the same time, the feasibility 
of correlating specific genes with particular chro- 
mosomes or chromosome regions by a technol- 
ogy called somatic cell hybridization was demon- 
strated (75). This and other experimental methods 
developed in the 1970s radically advanced the 
study of human genetics, allowing investigators 
to locate autosomal genes on human genetic and 
physical maps [Judson, see app. A] (50,58). 
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LINKAGE MAPS OF RESTRICTION FRAGMENT 
LENGTH POLYMORPHISMS 



The advent of recombinant DNA technology in 
the 1970s brought about a tremendously useful 
new way to create genetic linkage maps. Exami- 
nation of DNA from any two individuals reveals 
that variations in DNA sequence occur at random 
about once in every 300 to 500 base pairs (37). 
These variations occur both within and outside 
of genes ; and most do not lead to functional 
changes in the protein products of genes. Kan and 
Dozy (40) were the first to demonstrate this phe- 
nomenon experimentally, by showing that one par- 
ticular DNA sequence, recognized by the restric- 
tion enzyme Hpal, was lost in certain individuals. 
(Bestriction enzymes are proteins that recognize 
specific, short nucleotide sequences and cut the 
DNA at those sites.) This alteration in the DNA 
correlated with the inheritance of sickle cell 
disease. 

This important discovery led researchers to pro- 
pose that natural differences in DNA sequence 
^lymorphisms) might replace other chemical and 
morphological markers as a way to track chro- 
mosomes through a family (5,64). In addition to 
polymorphisms in restriction enzyme cutting sites, 
it is possible to detect differences among individ- 
uals in the number of copies of short DNA se- 
quences repeated in tandem. Polymorphic se- 
quences can occur within a restriction enzyme 
cutting site or between sites. In either case, the 
lengths of DNA fragments generated upon cut- 
ting the DNA with restriction enzymes will vary 
among individuals having different alleles at such 
locations. These polymorphic sequences are thus 
commonly referred to as restriction fragment 
length polymorphism (RFLP) markers. 

In 1983, genetic linkage between a RFLP marker 
on chromosome 4 and Huntington's disease (a neu- 
rological disease that usually strikes its victims by 
the age of 35) was discovered (31), paving the way 
for the general use of RFLPs as markers for genes 
responsible for inherited disorders . The more fre- 
quently a RFLP marker is inherited with the gene, 
the more likely it is to be physically close to the 
gene, and hence the more useful it is as a gene 
marker. The major limitations to the usefulness* 
of RFLP markers are how polymorphic they are 



(how much they vary among individuals), how 
many other markers exist in the same region, and 
the extent to which DNA samples of large fam- 
ilies are available for analysis (43,44). 

RFLP Mapping 

A RFLP map is a type of genetic linkage map, 
consisting of markers distributed throughout the 
genome. Construction of the map involves deter- 
mining the linkages between RFLP markers, their 
arrangement along the chromosomes, and the 
genetic distances between them. RFLP markers 
are identified and mapped by comparing the sizes 
and numbers of restriction enzyme fragments 
generated from different individuals. Just as 
genetic loci representing expressed DNA segments 
have alternate, or allelic, forms, so may RFLPs. 
The value of any marker depends mostly on how 
many variants it displays. The more often the 
marker varies in a population, the more likely it 
is that an individual will inherit two different 
alleles at the marker location (one on each mem- 
ber of a matched pair of chromosomes), making 
it possible to detect recombination between mark- 
ers in that individual's offspring (76). 

In RFLP mapping, DNA obtained from white 
blood cells (lymphocytes) or other tissues of sev- 
eral different individuals are first cut into frag- 
ments using restriction enzymes (figure 2-6). The 
fragments are then separated by size. This is ac- 
complished by a procedure called electrophore- 
sis, in which a mixture of DNA fragments of vari- 
ous sizes is placed in a polymeric gel (e.g., agarose) 
and then exposed to an electric field. Because the 
chemical makeup of DNA gives it a net negative 
charge; the DNA fragments will travel in an elec- 
tric field toward a positive electrode. Large DNA 
fragments will move more slowly than small ones, 
thus the mixture is separated, or resolved, accord- 
ing to size. With very large pieces of DNA, the 
use of restriction enzymes yields numerous frag- 
ments along the entire length of the gel, making 
it necessary to identify RFLPs using radioactively 
labeled, single-strand segments of DNA called I?A^y4 
probes (65). RFLP markers are identified by vir- 
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tue of their ability to form base pairs (hybridize) 
with DNA probes that have complementary se- 
quences of nucleotides. Some useful probes for 

Figure 2-6 ^Detection of Restriction Fragment Length 
Polymorphisms Using Radloactlvely Labeled 
DNA Probes 



Genomic DNA From Three Blood Samples 




Agorose Gel Electrophe^i^ 




^Denature DNA 



[Transfer to Nylon 
y Membrane 




I Add Radioactively Labeled 
I DNA to Nylon Membrane for 
X Hybridization 

I Expose on X-Ray Film 
Jpr For Autoradiography 




Variations In DNA sequences at particular marker sites are 
observed as differences In numbers and sizes of DNA frag- 
ments among samples taken from different Individuals (shown 
here as samples A, B, and C). 

SOURCE: Office of Technology Assessment, 1986 



RFLP mapping are fragments of genes; others are 
randomly isolated DNA segments that identify 
polymorphisms; still others are complementary 
to sequences with variable numbers of tandemly 
repeated, shorter sequences that occur frequently 
within the genome. A technique called autoradi- 
ography is used to show the image of a band on 
an X-ray film wherever the agarose gel held a re- 
striction enzyme fragment that hybridized to the 
DNA probe. Where polymorphisms occur, differ- 
ent patterns will be observed among samples taken 
from different individuals [Myers, see app. A] (fig- 
ure 2-6). 

When Is a Map of RFLP Markers 
Complete? 

Botstein and co-workers (5) predicted in 1980 
that only 150 different markers would be needed 
to link all human genes to chromosomal regions 
containing RFLPs. In practice, however, it has been 
estimated that many more markers may have to 
be studied and evaluated in order to find the min- 
imum number which would be randomly distrib- 
uted over the genome (45). It now appears that 
hundreds of DNA probes for highly polymorphic 
sequences, scattered widely over the genome, will 
be^required for a complete human linkage map 

With a 10-centimorgan map, for example, there 
is a greater than 90 percent chance of being able 
to determine the rough chromosomal location of 
any gene associated with an inherited disease. Ray- 
mond white ii, id colleagues at the Howard Hiighes 
Medical Institute at the University of Uiah have 
taken advantage of one such tandemly repeated 
sequence, known as VNTR, to create a set of 
probes useful for ?r.aking 3 'complete RFLP map 
of the human genome (76). White's RFLP map, with 
continuously linked landmarks separated on aver- 
age by 10 centimorgans (about 10 to 20 million 
base pairs), is nearly complete. At the ninth inter- 
national Human Gene Mapping Workshop, held 
in September 1987, he reported 475 markers cov- 
ering 17 human chromosomes, based on the DNA 
from 59 different three-generation families. 
White's group and other geneticists believe that 
a i-centimorgan RFLP marker map, determined 
from normal families and consisting of thousands 
of markers spaced an average of 1 million base 



ERLC 



90 



pairs apart, would be the ideal research tool (see 
ch. 3 for further discussion) (17,52). 

Another group, led by Helen Donis-Keller at Col- 
laborative Research, Inc. (Waltham, MA), reported 
its owm RFLP linkage map, consisting of 403 mar- 
kers an average of 9 centimorgans apart. A new 
gene or marker on their map can be located rela- 
tive to the existing markers 95 percent of the time 
(24). 



As physical markers that can be followed ge> 
netically, RFLPs are the key to linking the 
genetic and physical maps of the human ge- 
nome. RFLP linkage maps, as well as linkage maps 
of expressed genes, can be correlated with band- 
ing patterns and other identifiable regions of chro- 
mosomes by somatic cell hybridization and in situ 
hybridization. These and other relatively low reso- 
lution physical mapping technologies are de- 
scribed in the following sections. 



LOW-RESOLUTION PHYSIC 

A physical map is a i*epresentation of the loca- 
tions of identifiable landmarks on DNA. For the 
human genome, the physical map of lowest reso- 
lution is found in the banding patterns on the 22 
autosomes and the X and Y chromosomes observ- 
able under the light microscope. This map has at 
most 1,000 landmarks (i.e., visible bands) (57). 

Anothei type of relatively low resolution phys- 
ical map illustrates the positions of expressed seg- 
ments of DNA relative to certain regions of the 
chromosome or to specific chromosome bands. 
Expressed genes include those that are transcribed 
into mRNA and then translated into protein, and 
another class of essential genes that are tran- 
scribed iiito RNA but not translated into proteins. 
Included in the latter class are transfer and ribo- 
somal RNAs involved in protein synthesis, RNAs 
involved in the removal of intron sequences from 
mRNAs, and an RNA associated with the cellular 
protein secretion machinery. Procedures are avail- 
able to make DNA copies, or complementary DNAs 
(cDNAs), of RNA transcripts. These cDNAs can in 
turn be mapped to genomic DNA sequences by 
somatic cell hybridization, in situ hybridization, 
and other low -resolution physical mapping meth- 
ods. A physical map illustrating the location of 
expressed genes is often referred to as a cDNA 
map. As noted earlier, only 1,200 of the 50,000 
to 100,000 human genes have been physically 
mapped to chromosomes. 

A high-resolution physical map can be made by 
cutting up the entire human genome with restric- 
tion enzymes and ordering the resultant DNA seg- 
ments as they were originally oriented on the chro- 
mosomes. This third type of physical map, a contig 



MAPPING TECHNOLOGIES 
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Banded pattern of Drosophlla melanogaster salivary 
gland chromosomes as seen under phase 
contrast light microscopy. 

map, can be related to the maps of chromosome 
bands and expressed genes. The physical map 
of highest possible resolution:; or greatest 
molecular detail^ is the complete nucleotide se- 
quence of the human genome. Thus there is a 
continuum of mapping techniques that ranges 
from low to high resolution (see table 2-2). These 
techniques are discussed in this section, on low- 
resolution physical mapping, and in the follow- 
ing one, on high-resolution physical mapping 
methods. 

Somatic Cell Hybridization 

The somatic cell hybridization technique for 
gene mapping typically employs human fibroblast 
and rodent tumor cells grown in culture. The hu- 
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man and mouse cells are fused (hybridized) to- 
gether using certain chemicals, Sendai virus, or 
an electric field, as illustrated in figure 2-7 (58). 
The chromosomes of each of the fused cells be- 
come mixed, and many of the chromosomes are 
lost from the hybrid cell. Human chromosomes 
are preferentially lost over rodent chromosomes, 
but there is generally no preference for which 
human chromosomes are lost. The individual hy- 
brid cells are then propagated in culture and main- 
tained as cell lines. In practice, the hybrid cell lines 
resulting from cell fusions contain different sub- 
sets of between 8 to 12 human chromosomes in 
addition to rodent chromosomes (58). 

Using a large set (panel) of somatic cell hybrids 
containing different chromosome combinations, 
it is possible to correlate the presence or absence 
of a particular chromosome with a particular gene. 
Assignment of a gene to a chromosome is made 
by detecting a protein produced by a hybrid cell 
line and associating it with the chromosome 
unique to that cell line. Alternatively, if the gene 
to be mapped has already been isolated by DNA 
cloning procedures, then the gene can be used 
directly to identify complementary nucleotide se- 
quences in the DNA extracted from somatic cell 
hybrids. 

Modifications of the somatic cell hybridization 
method have been devised to generate single chro- 
mosome hybrids; to date, hybrid cells containing 
single copies of human chromosomes 7, 16, 17, 
19, X, and Y are avaUable (58). Somatic cell hybrid 
lines carrying chromosomes with deletion or 
translocation mutations are also useful low -reso- 
lution mapping tools because they make it possi- 
ble to infer the location of a particular gene.^ 

Chromosome Sorting 

Chromosome sorting offers an alternative to the 
screening of somatic cell hybrid panels for low- 
resolution gene mapping. In this approach, DNA 
hybridization is used to map genes to chromo- 
somes that have been differentiated by flow 



The Institute for Medical Research, in Camden, New Jersey, estab- 
tiihed a repository for SCHs with chromosome rearrangements called 
the Human Mutant Cell Library. The availability of this centralized 
storage facility has accelerated the rate of mapping human genes 
to specific chromosomal locations 
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Figure 2-7.— Somatic Ceii Hybridization 



Human Sendai Mouse 
fibroblast virus tumor cell 




Somatic cell hybrids are generated by the process of cell fu- 
sion, an event that can be enhanced by adding Sendai virus. 
Initially, the hybrid cell contains complete set** of chromo- 
somes from both parental cells, but hybrids of human and 
mouse cells are unstable and chromosomes from the human 
cells are preferentially lost. After a few generations in cul- 
ture, a line of hybrid cells is established that contains both 
mouse and human chromosomes. 

SOURCE Office of Technology Assessment. 1988 
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Figure 2-8.— Chromosome Purification by 
the Flow Sorter 



Photo cmm; Lmry Domm . Loo Alomoo Notfonol Lobontory, Loo Afomos, NM 

Flow cytometry facility for chromosome sorting at 
Los Alamos National Laboratory. 



cytometry and purified by flow sorting. Fluores- 
cent markers that bind to chromosomes are used 
in flow cytometry as the basis of separating chro- 
mosomes from one another in a flow sorter (fig- 
ure 2-8) (21;29;30,46). Because human chromo- 
somes differ in the degree to which they bind the 
fluorescent markers^ it is possible to use this ap- 
proach to physically separate some chromosomes 
from others. The dual-laser chromosome sorter 
has been used successfully to separate all the hu- 
man chromosomes except chromosomes 10 and 
11. In addition, chromosomes from cell lines with 
translocations and deletions can be used to nar- 
row the location of the gene to a certain chro- 
mosomal region (46). 

To determine on which chromosome a gene lies, 
chromosomes are sorted onto different paper 
fUters made of nitrocellulose. There the DNA is 
denatured and hybridized with a radioactively la- 
beled DNA probe complementary to the gene to 
be mapped. Qn general the cDNA is available for 
use as a probe for the gene.) On whichever chro- 
mosome the gene appears, the two sequences wiU 
hybridize, and the hybridization can be observed 
using autoradiography. 

Karyotyping 

At a stage of cell division when chromosomes 
have duplicated but not yet separated from one 
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Chromosomes stained with a fluorescent dye are passed 
through a laser beam. Each time, the amount of fluorescence 
is measured and the chromosome deflected accordingly. The 
chromosomes are then collected as droplets. 

SOURCE Courtesy of Los Alamos National Laboratory, Los Alamos, NM 



another, they condense to form structures with 
features that .n be observed under a light micro- 
scope. The structure of human chromosomes can 
be studied by chemically fixing white blood cells 
at the appropriate stage of cell division and then 
photographing the chromosome spreads as they 
appear on slides under the microscope. Individ- 
ual chromosomes are identified in the photograph, 
cut out, and, in the case of autosomes, matched 
with their morphoIogicaUy identical chromosome 
partner to generate a karyotype. Karyotyping has 
been most useful for correlating gross chromo- 
somal abnormalities with the characteristics of 
specific diseases (e.g., Down's syndrome and 
Turner's syndrome). 
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Assignment of a gene and genes with related se- 
quences to specific human chromosomes. Sampies of 
the 21 different human chromosomes were sorted onto 
1 1 ci rcular filters and then hybridized to a radioactively 
labeled DNA probe from the aldolase gene (aldolase 
is an enzyme involved In the metabolism of sugars). 
Most of the radioactive signal in the autoradiograph 
appears on the filter with chromosome 9, indicating 
that the complementary DNA sequence is in that chro- 
nrK>some. The autoradiograph also shows some hybridi- 
zation to chromosomes 17 and 10, indicating that al- 
dolase genes with different, but simitar, sequences are 
found on these chromosomes. 

Chromosome Banding 

Using fluorescent dyes as chromosome specific 
stains, Caspersson and others (10-12) developed 
optical methods for observing the banding pat- 
terns on human chromosomes. These methods 
reveal more details of chromosoipa morphologj' 
than does simple karyotyping. The bands are chro- 
mosomal regions that appear as stripes on chro- 
mosome spreads when viewed under the light 
microscope. Each of the 24 different human chro- 
mosomes has a unique banding pattern, thus the 
bands can be used to identify individual chromo- 
somes. Genes can be mapped to specific bands 
by identifying differences between the banding 
patterns on chromosomes from normal individ- 
uals and those on the chromosomes from an indi 
vidual with a significant chromosomal alteration. 

Nearly 1,000 distinct bands have been detected 
on the 24 human chromosomes by staining and 
light microscopy, and an average of 100 genes is 
represented in a single band (50). Chromosome 



banding is a useful procedure for finding the gen- 
eral location of a gene, but it does not offer suffi- 
cient resolution to identify the exact position of 
a gene relative to other genes mapped in the same 
region (58). 

In Situ Hybridization 

Family linkage and somatic cell hybridization 
are not direct mapping methods; they are based 
on the correlation between traits and the fre- 
quency of transmission of those traits in families. 
Karyotyping and analysis of chromosome band- 
ing allow a specific trait to be correlated with a 
particular chromosome or a large region of a chro- 
mosome. Advances in molecular biology have 
overcome the limitations of those techniques by 
providing means for more precise mapping of 
genetic markers. On<- such method is in situ 
hybridization of isolated genes or gene fragments 
to chromosomal DNA. 

The in situ hybridization technique was origi- 
naUy developed by Mary Lou Pardue and Joseph 
Gall for detection of genes encodirg ribosomal 
RNAs in chromosomes from Drosophila salivary 
glands (56). In the typical in situ hybridization ex- 
periment, the DNA corresponding to a particular 
gene or gene fragment is used to probe for com- 
plem. .itary sequences in chromosomes (28). The 
chromosomes to be analyzed are fixed on a micro- 
scope slide, where the DNA strands are chemi- 
cally treated and separated. Next, the radioactively 
labeled DNA probe is mixed with the chromo- 
somes on the slide. Under proper conditions, the 
DNA probe hybridizes with the gene sequence 
wherever it is located on the prepared chro- 
mosomes. 

Results of in situ hybridization can be seen by 
exposing the slides to a photographic emulsion 
for a long period, then analyzing the photographs 
under a microscope. Wherever the radioactively 
labeled DNA strands have paired with complemen- 
tary chromosomal regions, tiny silver grains ap- 
pear. The location of a specific gene can be found 
by counting the number of grains in each region 
and using computer methods to analyze the data 
(58). Although in situ hybridization has been a 
principal method for the mapping of human genes 
to autosomes, higher -resolution methods are nec- 
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."<*sary. The procedure is fimited to a resolution 
of about 10 million base pairs, a substantial por- 
tion of the total length ol most chromosomes. Since 
many genes could fit into such a region, the exact 
location of the gene of interest must still be de- 
termined precisely (58). 

Other Methods for Mapping Genes 

Several other techniques for mapping human 
genes are available, including gene dosage map- 
ping and compai ative mapping of species . In gene 
dosage mapping, a correlation is made between 
the amount of gene product and the presence of 
extra genes or the absence of a gene or chromo- 
some fragment. Biochemical analysis of cellular 
contents isolated from an individual with a par- 
ticular genetic disease, or from somatic cell by- 
brid lines df rJ/ed from that individual's cells, is 
performed tt> nieasure amounts of gene products. 



The structure of ihe altered chromosome (or chro- 
mosomes) is then characterized by one or more 
of the methods already described. 

Comparative mapping of species can provide 
useful human gene mapping information. This is 
particularly true among mammals, where it has 
been demonstrated that different species have sim- 
ilar patterns of gene organization on certain chro- 
mosomes [Computer Horizons, Inc., see app. Al. 
For example, tabulations show that all of the hu- 
man autosomes except chromosome 13 have at 
least two linked genes which are also linked in 
the mouse (35). 

Comparison of the banding patterns of chro- 
mosomes from different species have also proved 
useful in matching chromosomes between spe- 
cies, even though differences in total numbers of 
chromosomes exist. There is, for example, a strik- 
ing resemblance between chimpanzee and human 
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A female with the extra chromosome 21 associated with Down's syndrome. 
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HIGH RESOLUTION PHYSICAL MAPPING TECHXOLOGIES 



Ckinstruction of high-resolution physical 
maps of whole genomes involves cutting the 
component DIVA with restriction enzymes, 
analyzing the chemical characteristics of each 
fragment and then reconstructing the cirigi- 
nal order of the fragments in the genome. Gen- 
erally, the DNA fragments to be ordered are iso- 
lated from chromosomes; united with carrier, or 
vector, DNA molecules originating from viruses, 
bacteria, or the ceUs of higher organisms; and in- 
troduced into suitable host cells, where the iso- 
lated DNA can be reproduced in large quantities. 
A fragment of DNA is said to be cloned when it 
is stably maintained as part of a DNA vector in 
a single line of cells. A set of clones representing 
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overlapping segments of DNA encompassing an 
entire genome is caUed & genomic library . In or- 
der to make a physic-\l map, the clones in the 
genomic library must be ordered in relation to 
one another's position on the chromosome. The 
following sections describe in mere detail the 
methods currently available for creating high- 
resolution physical maps and their application to 
the genomes of specific organisms. 

Cloning Vectors aa Mapping Tools 

Any genome mapping project first requires the 
isolation, usually by cloning technologies, of frag- 
ments of chromosomal DNA. Several different 
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pairs, but because they contain phage lambda 
sequences, they can carry DNA inserts up to 
about 45,000 base pairs (figure 2-9) (25,34,36). 
• Yeast artificial chromosomes are plasmids 
containing portions of yeast chromosomal 
DNA that function in replication. These arti- 
ficial chromosomes can accommodate foreign 
DNA fragment inserts nearly 1 million base 
pairs long (6). 

Figure 2-9.— DNA Cloning In Plasmids 
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A com paratlve alignment of chromosomes from the gi- 
ant panda (AME) and the brown bear (UAR). The puta- 
tive matches between the whole chromosomes, orcht)- 
mosome segments^ of each animal were based on the 
thickness of stained bands on the chromosomes and 
on the spacing and Intensity of the bands. This type 
of molecular information has been used to establish 
the phylogeny of these bears and to demonstrate some 
of the problems In using the appearance of animals, 
instead of their chromosome structure, 
In studies of evolution. 



types of cloning vectors have been developed using 
recombinant DNA technology: 

• Plasmid vectors are circular DNA molecules 
of 1;000 to 10;000 base pairs that can carry 
additional DNA sequences in fragment inserts 
up to 12;000 base pairs (2,4). Plasmids exist 
as minichromosomes in bacterial cells (usu- 
ally between 10 to 100 copies per cell) and 
are separate from the main bacterial chro- 
mosome. 

• Phage lanfibda chromosomes are ahc Mt 50,000 
base pdrs and can accept foreign DNA inserts 
up toabout 23,000 base pairs (33,79). Just as 
viruses infect human cells, phage infect bac- 
t^al cells and generate hundreds of de- 
scendants. 

• Cosmid vectors are plasmids that also con- 
tain specific sequences from the bacterial 
phage lambda. Cosmids are about 5,000 base 
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Most of the physical mapping work carried out 

cfnitll f"Pu y^'* bacteriophage and cosmid 
doning vectors because the yeast artificial chro- 
mosome vectors have only recently been devel- 
oped [Myers, see app. A]. 

Physical Mapping of Restriction 
Enzyme Sites 

With the exception of DNA sequencing, restric- 
Uon enzyme mapping is the method that gives the 
hlgh^t-resolution picture of DNA as it is organized 
in a chromosome. Several basic steps are involved 
ui the construction of this type of physical map 
for part or aU of a genome: ^ 

• purifying chromosomal DNA, 

• fragmenting DNA by restriction enzymes, 

inserting all the resulting DNA fragments into 
DNA vectors to establish a coUection dibrary) 
of cloned fragments, and 

• ordering the clones to reflect the original 
order of the DNA fragments on the chro- 
mosome. 

VariaUons in any of these steps can affect the reso- 
lution of the physical map. 

Purification of Chromosomal DNA 

VVhole chromosomes are the best source of DNA 
for genomic Ubraries. Mixtures of chromosomes 
can be extracted directly from cells, but for organ- 
isms with complex genomes, such as human be- 
in^. It might be desirable to first separate the 
cMferent chromosomes and then create sets of 
clones from the individually purified chro- 
mosomes. 

Mixtures of whole chromosomes extracted from 
human cells can be sorted by flow cytometry. So- 
matic ceU hybrid lines carrying one or a few hu- 
man chromosomes can also be used as a highly 
ennched source of particular chromosomes. The 
refinement of existing methods and the devel- 
opment of new technologies for obtaining 
lai^e amounts of purified human chroma 
Mrnes wiU be crucial in the early stages of hu- 
man genome mapping projects. 



Fragmentation of DNA 

The availability of chromosome fragments of 
decreasing size allows mapping at higher resolu- 
tion A technology called pulsed field gel elec- 
trophoresis (PFGE) allows separation of DNA 
molecules ranging in size from 20,000 to 10 raU- 
hon or more base pairs (8,9, 13,61) (Myers, see app. 

During PFGE, large DNA fragments are sub- 
jected to an electric field that is switched back 
and forth across opposite direcUons for short 
pulses of time. This alteration in the direcUon of 
the electric field allows very large DNA molecules 
(up to tens of miUions of base pairs) to migrate 
into the agarose gel and separate from one 
another, even though the normal size limit for elec- 
trophoretic separation of DNA molecules during 

S nnnT"'' '^"'"f electrophoresis is about 
50 000 base pairs. This method is so powerful that 
It has been used successfully to separate aU 14 
ot the yeast chromosomes from each other (fig- 
ure 2-10) (Myers, see app. A]. Since intact human 
chromosomes have an average size of approxi- 
mately 100 million base pairs, the PFGE technique 

Itnm^ ^P^'-^t*^ large fragments made 

from individual, purified human chromosomes. 

The level of detail possible on a physical map 
depends on the restriction enzyme or enzymes 

r, ,f nw A ^'"^ ^.'"^ ^ restriction enzymes that 
cut DNA very infrequently, generating small num- 
hers of large fragments (ranging from several 
thousand to a miUion base pairs). Most restriction 
enzymes cut DNA more frequently, generating 
arge numbers of small fragments (Jai^ng from 
fewer than a hundred to greater than a thousand 
base pairs). The relative order of a smaU set of 
large fragments is easier to del-rmine than the 
order of a large set of short fragments, but it gives 
a lower-resolution physical map. the choioe of 
enzyme thus depends on the purposeof the phvs- 
ical map If the aim is to have fragments of a size 
amenable to DNA sequencing, then a mapped re- 
striction site at least every 500 base pairs would 
De Ideal, but a mapped site every 2,000 to Sfloo 
base pairs would also be practical. Given the 
technology currently available, seauenciiu 
the DNA of the S-billion-base-pair haploid hiH 
man genome might require the prior mappfaig 
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of as many as 6 million restriction enzyme cut- 
ting sites (69) (Myers; see app. A]. 



Figure 2-11. —Constructing a Library off Clones 
Containing Overlapping Chromosomal Fragments 



Construction of Libraries 
of DNA Fra^ents 

For physical mapping projects ; it is important 
to have as much DNA as needed. The use of cloned 
DNA fragments offers this advantage. Fragments 
of DNA from whole chromosomes are generally 
cloned into vectors such as plasmidS; cosmidS; 



Figure 2-10.— Separation off Intact Yeast Chromosomes 
by Pulsed Field Qel Electrophoresis 
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phage chromosomes; and artificial yeast chromo- 
somes. These vectors can be stably maintained 
in host cells (bacteria or yeast) that multiply rap- 
idly to provide the amounts of DNA necessary for 
restriction enzyme mapping and DNA sequenc- 
ing. DNA fragments are usually cloned by cutting 
the vector of choice with a restriction enz3aTie and 
then connecting the newly generated ends of the 
vector to the ends of the DNA fragments with the 
enzyme DNA ligase. The resulting collection of 
clones is called a library. There is no obvious or- 
der to the library; and the relationship between 
the components can only be established by phys- 
ical mapping. 

In order to establish that any two clones repre- 
sent chromosomal segments that normally occur 
next to one another in the genome; it is neces- 
sary to have collections of clones representing par- 
tially overlapping regions of chromosomal DNA 
(figure 2-11). To create libraries of overlapping 
clones; the chromosomal DNA is treated with a 
frequent-cutting restriction enzyme; one that cuts 
every 500 base pairs or sO; but conditions are con- 
trolled so that the enzyme is not allowed to cut 
the DNA at all the possible restriction enzyme sites. 
Instead; by lowering the amount of restriction en- 
zyme used; only partial cutting is allowed. The 
experimental conditions for partial cutting are ad- 
justed so that DNA fragments are generated with 
an average size equal to the vector's capacity (usu- 
ally 20;000 to 50;000 base pairs). In theory; no one 
of the cutting sites will be recognized by the re- 
striction enzyme more frequently than another^ 
so a population of overlapping segments repre* 
senting all possible cutting sites in the original DNA 
sample should be generated. These fragments ^re 
then cloned in the appropriate vector. 

Determination of the Order of Clones 

The clones in a library are ordered by subdivid- 
ing the chromosomal DNA inserts into even 
smaller fragments and identifying which clones 
have some common subfragments. Figure 2-12 il- 
lustrates how this is done. A particular DNA clone 
(vector plus the chromosomal DNA insert) is 
cleaved with one or more restriction enzymes 
(other than that used to make the clones) under 
conditions in which all sites are recognized and 
cut. The resulting fragments are then run on a 



Figure 2-12.-Maklng a Contig Map 
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gel made of agarose . After electrophoresis^ a pat- 
tern of fragments is observed along the length 
of the gel. If the DNA fragments are present in 
sufficient amounts^ they can be seen under ultra- 
violet light after staining the gel with the dye 
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ethidium bromide; otherwise; the phosphates at 
the ends of the DNA fragments are labeled with 
a radioactive isotope and viewed after autoradi- 
ography. A unique pattern of bands appears (cor- 
responding to DNA fragments) for any given clone 
because of the unique arrangement of restriction 
enzyme sites in the region of the chromosome 
from which that clone was derived. If two clones 
contain overlapping segments of DNA; then a por- 
tion of tiie banding pattern for each will be iden- 
tical. For example; if the clone order is A-B-C-D; 
then the restriction enzyme fragments from clone 
A will partially overlap with those from clone B; 



clone B fragments with clone C; and so on (figure 
2-12). 

Groupings of clones representing overlapping; 
or contiguous; regions of the genome are known 
as contigs (18;66). On an incomplete physical map, 
contigs are separated by gaps where not enough 
clones have been mapped to allow the connec- 
tion of neighboring contigs. Of all the steps in phys- 
ical mapping; the connection of all the contigs is 
the one that faces the greatest number of techni- 
cal problems. Therefore; the time required to 
achieve a complete physical map of any ge> 
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nome is a function of the time required to con- 
nect neighboring contigs. 

Physical Mapping of Nonituman 
Genomes 

So far, high-resolution mapping of entire ge- 
nomes has focused on nonhuman organisms. Most 
of the technologies applicable to human genetic 
and physical mapping, therefore, have been de- 
veloped from work on other organisms. Mapping 
of complete genomes is well underway for sev- 
eral species of bacteria and yeast, for the nema- 
tode, and is beginning for the fruit fly. These 
organisms have long served as excellent model 
systems for what are sometimes found to be 
universal genetic and biochemical mechanisms 
governing cell physiology. The technologies em- 
ployed in these high -resolution genome mapping 
projects range from making contig maps to fine 
mapping by DNA sequencing. 

The Bacterial Genome 

For many years, bacteria (mainly Escherichia 
coti) and phage (viruses that infect bacteria) have 
been principal research subjects of molecular bi- 
ologists, molecular geneticists, and biochemists. 
Because of the relative ease of 8tud)ang gene func- 
tion in E. coti, it is the organism whose genetics 
and biochemistry are closest to being completely 
understood. The DNA of this bacterium is con- 
tained in a single circular chromosome of 4.7 mil- 
lion base pairs (63). The genetic map of E, coti is 
quite extensive, with about 1,200 of the 5,000 or 
so known genes already cloned (3). In addition, 
the nucleotide sequence of over 20 percent of this 
bacterial genome is known (26). 

Progress on the physical map of the E. coli ge- 
nome is good. Cassandra Smith and co-workers 
at Columbia University made a complete physi- 
cal map of this genome using a restriction enzyme 
called Not I, which cuts DNA only infrequently 
(63). Not I recognizes a sequence eight nucleotides 
long that is expected to occur by chance once 
every 34,000 base pairs. Only 22 rjot I sites were 
found in the E, coli genome (63). 

A higher-resolution physical map of £. coti was 
generated by Kohara and colleagues (42) at Nagoya 
University in Japan. These researchers devised 



an innovative, rapid mass-analysis mapping ap- 
proach involving eight different restriction en- 
zymes. In a period of time equivalent to only one- 
half of a person-year, this group generated a high- 
resolution physical map covering 99 percent of 
the E. coti genome, leaving only seven gaps. An 
independent effort to generate a high-resolution 
map of the cutting sites for three different, 
frequent-cutting restriction enzymes is also near 
completion at the University of Wisconsin, Madi- 
son (20). 

The work of the researchers in Wisconsin and 
Japan is important because it generates an ordered 
set of clones. A map indicating the order of a li- 
brary of genomic clones is immediately useful to 
anyone wishing to examine DNA corresponding 
to a gene whose position on the map is known. 
The physical map is correlated with the genetic 
map at many sites in E. coti, primarily as a result 
of including in the analysis clones containing 
known genes. Kohara and co-workers demon- 
strated that the use of large fragments for con- 
necting groups of clones is not necessary for E, 
coti. Because of the computational limitations on 
connecting great numbers of small fragments, 
however, large-fragment maps, analogous to the 
Not I map of E, coti, will no doubt play a signifi- 
cant role in mapping large genomes, such as the 
human genome . 

The Yeast Genome 

An ongoing project to map the 15 million base 
pairs in the Saccharomyces cerevisiae (u biker's 
yeast) genome has been described by Olson and 
colleagues (55) at Washington University. These 
researchers initiated the mapping project to fa- 
cilitate the organization of the vast amount of in- 
formation already available on this organism. As 
Olson writes: 

Just as conventional cartography provides 
an indispensable framework for organizing 
data in fields as diverse as demography and 
geophysics, it is reasonable to suppose that 
"DNA cartography" will prove equally useful 
in organizing the vast quantities of molecu- 
lar genetic data that may be expected to ac- 
cumulate in the coming decades (55). 
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A large fraction of the S. cerevisiar genome 
(about 95 percent) is available in clones that have 
been joined together in over 400 contiguously 
mapped stretches. These contigs are being cor- 
related with a complete large-fragment restric- 
tion map for the yeast genome. These combined 
maps make it possible to construct or identify a 
mapped region 30,000 to 100,000 base pairs in 
length around virtually any starting point; typi- 
cally a cloned gene [Mount; see app. A]. 

The Nematode Genome 

The nematode Caenorbabditis elegans is a popu- 
lar organism among developmental biologists be- 
cause the origin and function of all 958 cells in 
the adult animal are known; offering research- 
ers the opportunity to study the basis of organis- 
mal development. With its 3-day generation time; 
C. elegans is also suited to genetic studies. Molecu- 
lar biologists, interested in the molecular basis of 
development; would find an ordered set of clones 
from the nematode genome particularly useful 
for their work [Mount, see app. A]. 

Coulson and Sulston at the Medical Research 
Council in England initiated a C. elegans mapping 
project to provide such tools and to establish com- 
municatijns among the laboratories working on 
this organism. Like the 5. cerevisiae genome map- 
ping project, this resulted in a set of clones that 
covers most of the genome (18). One difference 
is that the C. elegans clones are put into order 
by the fingerprinting method: Distances from each 
cleavage site for one enzyme to the nearest site 
for a second enzyme were measured; and clones 
sharing a number of such distances (measured 
as lengths of restriction fragments observed on 
polyacrylamide gels) were considered to overlap. 
This process makes identification of overlapping 
regions somewhat easier (because the informa- 
tion is denser); at the expense of more precise 
physical map information. A second difference 
is that cosmid clones were used in the nematode 
project; while phage clones were used in the yeast 
project. Cosmid clones can accommodate larger 
DNA inserts than phage clones ; but they can also 
be less stable; with portions of the inserts becom- 
ing deleted more often (17). At present; over 700 
contigS; ranging from 35;000to 350^000 base pairs 
in length and representing 90 percent of the C. 
elegans genome, have been characterized (71). 



The Fruit Fly Genome 

The genetics of the common fruit fly; Drosopbila 
welanogaster, are the best characterized of any 
multicellular organism. One reason for studying 
fruit flies is that it is possible to cany out a saturat- 
ing screen to detect mutations of a particular type. 
In a saturating screen; every gene that could mu- 
tate to produce the defect being studied is identi- 
fied. (This accomplishment is crucial to a complete 
understanding of many cellular processes.) The 
saturating screen technique allows for a compre- 
hensive genetic analysis because the entire ge- 
nome can be examined for the presence of genes 
that are involved in a particular process. The most 
celebrated example is an exhaustive study of mu- 
tations that are lethal to the fly in its larval stage 
(39;53;78) (Mount; see app. A]. 

Until recently; the physical mapping of the 165 
million base pairs in theD. welanogaster genome 
had not been undertaken by any one laboratory. 
Roughly 500 to 1;000 genomic clones have been 
obtained in various laboratories in various vec- 
tors; all of these clones have been localized to a 
chromosomal map position by in situ hybridiza- 
tion to polytene chromosomes (a multicopy set 
of D. welanogaster chromosomes unique to its sal- 
ivary gland). A listing of these clones is maintained 
by John Merriam and colleagues at the Univer- 
sity of California; Los Angeles; and the clones are 
made available to all researchers [Mount; see app. 
A]. 

Work by Michael Ashburner and co-iiwestiga- 
tors at Cambridge University on a comprehensive 
map of overlapping cosmid clones of the D, 
welanogaster genome was approved for func.ing 
by the European Economic Community in late 
1987. This project is expected to follow the fin- 
gerprinting strategy of the nematode project; with 
the important difference that cytological maps 
(maps of banding patterns derived from micro- 
scopic analysis of stained chromosomes) of D, 
welanogaster chromosomes will be exploited. 
First; the technique of microdissection cloning 
(whereby DNA is excised from precise regions of 
the salivary gland polytene chromosomes and 
cloned) will be used to generate region-specific 
genomic clones. These microdissection ' ;nes are 
not of sufficient quality to be used directly, but 
they can be used to correlate cosmids in a stand- 
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ard genomic library with specific chi omosomal 
regions. This step makes it easier to assemble the 
contiguous clones into groups. Finally, the posi- 
tion of all contigs with respect to the cytological 
map will be confirmed by in situ hybridization, 
whereby cosmiu clones from the various contigs 
would be hybridized to salivary gland chromo- 
somes (Mount; see app. A]. 

Strategies for Physical Mapping of 
the Human Genome 

It is likely that making contig maps of large ge- 
nomes; such as the human genome, will require 
a combination of bottom-up mapping and top- 
down mapping (55). Bottom-up mapping starts by 
making genomic cloneS; then fragmenting these 
clones further to decipher the overlaps necessary 
for connecting clones into contigs. Top-down map- 
ping (e.g.; Smith's E. coli map) is of lower resolu- 
tion because it is derived from minimal fragmen- 
tation of source DNA. The critical distinction 
between the two methods is the size of the 
genomic DNA fragments used. Bottom-up map- 
ping starts with relatively small genomic clones , 
while top-down mapping starts with large genomic 
clones. The advantage of top-down mapping is that 
it offers more continuity (fewer gaps), while the 
bottom-up method has higher resolution (more 
detail). In formulating strategies for mapping the 
human genome, it will be necessary to decide what 
level of molecular detail is necessary to begin a 
human genome mapping project. Will information- 
rich strategies like those used to develop high- 
resolution E, coli restriction enzyme maps (20,42) 
or the DNA signposts offered by a RFLP map be 
the best first-generation human genome maps? 

Contig Mapping 

Scientists in the fields of molecular biology and 
human genetics who reviewed an OTA contract 
rep.^rt on possible strategies for making contig 
maps of the human genome [Myers, see app. A] 
favored the following strategy: to map the genome 
oiip chromosome at a time, dividing and subdivid- 
ing each chromosome into smaller and smaller 
segments before beginning restriction enzyme 
mapping and ordering of clones. After subdivi- 
sion, restriction maps of these smaller segments 
would be determined and the information linked 



together to form continuous maps of whole chro- 
mosomes. In principle, this strategy could be bro- 
ken down into five consecutive steps: 

1. isolation of each human chromosome, 

2. division of each chromosome into a collec- 
tion of overlapping DNA fragments 0.5 to 5 
million base pairs in length, 

3. subdivision and isolation of each of these chro- 
mosomal fragments into overlapping DNA 
fragments about 40,000 base pairs in length, 

4. determination of the order of the 40,000-base- 
pair DNA fragments as they appear in the 
chromosomes and determination of the po- 
sitions of cutting sites for a restriction enzyme 
within each of these fragments, and 

5. use of the mapping information gained in step 
4 to link together each of the overlapping 0.5- 
to 5-million-base-pair fragments isolated in 
step 2 [Myers, see anr» >\] 

The substantial progress made so far on contig 
maps of nonhuman genomes implies that tech- 
nologies already exist to begin construction of a 
global physical map of the human genome. The 
haploid human genome (approximately 3 billion 
base pairs) is at least 30 times larger than that 
of the nematode, the largest genome for which 
comprehensive physical mapping has been at- 
tempted. Sulston predicted that the mapping work 
he and his co-workers have done over the past 
4 years could be repeated within 2 person-years, 
because much of their time was spent devising 
computer methods for data analysis (17). If the 
size of a genome were linearly related to the time 
required to physically map it, then the human ge- 
nome could be mapped to the same degree of com- 
pletion as the nematode genome (90 percent) in 
about 60 person-years. Such calculations are sim- 
plistic, however, because features of the human 
genome other than its size make it potentially more 
difficult to map. For example, some UNA se- 
quences are repeated frequently throughout the 
human genome, in contrast to the nematode ge- 
nome, and these are likely to interfere with the 
physical mapping process. 

Techniques for isolating large chromosomal 
fragments should offer solutions to some of 
the physical mapping problems expected to 
arise from the occurrence of repetitive se- 
quences in the human genome. The two most 
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promising methods developed to date are the PFGE 
technology (8,9,13,61) and the yeast ainificial chro- 
mosome cloning technology (6). 

A National Research Council advisory panel on 
mapping and sequencing the human genome rec- 
ommended improvements in technologies for the 
following to facilitate the construction of physi- 
cal maps of large genomes: 

• separating intact human chromosomes; 

• separating and immortalizing identified frag- 
ments of human chromosomes; 

• cloning the cDNAs representing expressed 
genes, especially those that represent rare 
cell-, tissue-, and development -specific 
mRNAs; 

• cloning very large DNA fragments; 

• purifying very large DNA fragments, includ- 
ing higher-resolution methods for separating 
such fragments; 

• ordering the adjacent DNA fragments in a 
DNA clone collection; and 

• automating the various steps in DNA map- 
ping, including DNA purification and hybridi- 
zation analysis and developing novel meth- 
ods that allow simultaneous handling of many 
DNA samples (52). 

DNA Sequencing 

Strategies for sequencing the entire human 
genome are much more controversial than 
those for generating contig maps. Some scien- 
tists favor sequencing only expressed geneS; iden- 
tified with a cDNA map (17). Others propose that 
sequencing should continue to be targeted at spe- 
cific regions of interest, as is currently done. Still 
others hold the view that the whole genome 
should be sequenced because it could reveal se- 
quences with important functions that would 
otherwise go unidentified (see ch. 3;. "^he National 
Research Council panel proposed first that pilot 
programs be conducted with a goal of sequenc- 
ing approximately 1 million continuous nucleo- 
tides (which is about five times as large as the 
largest continuous stretch of DNA sequenced to 
date) (52). Second; improvements in existing DNA 
sequencing technologies juld be vigorously en- 
couraged. Finally; extensive sequencing of other 
genomes, including the mouse, fruit fly, nematode. 



yeast, and bacterial genomes, was recommended 
for purposes of comparison (52). 

The potential uses of human genome maps and 
sequences will likely dominate strategic decisions 
on which of the possible methods should be used 
to construct them (ch. 3). The strategy currently 
favored— preparing physical maps of individual 
chromosomes— requires that decisions be made 
on which ch omosomes should be mapped first. 
Mapping smaller chromosomes first in pilot 
projects (e.g.; chromosomes 21 and 22) would be 
the logical strategy from a technical perspective. 
Alternatively, selecting chromosomes linked to the 
largest numbers of markers for human genetic 
diseases (e.g., chromosome 7 and the X chromo- 
some) might make the impact of genome mapping 
on clinical medicine more immediate. Efforts are 
already underway in a number of U.S. and for- 
eign laboratories (ch. 8) to physically map (at vari- 
ous levels of resolution) human chromosomes 
known to be of general clinical significance or to 
carry genes of specific interest to the research- 
ers involved. Scientists at Los Alamos and Liver- 
more National Laboratories have begun mapping 
chromosomes 16 and 19, respectively. These chro- 
mosomes were chosen for their relatively small 
sizes and number of clinically relevant genetic 
markers. Researchers at Columbia University have 
begun work on a physical map of chromosome 
21 for similar reasons. 

£yiV>t Sequencing Technologies 

Two methods for sequencing DNA are standard 
in laboratories today. One technique, developed 
by Fred Sanger and Alan Coulson at the Medical 
Research Council in England (60), uses enz3mfies 
(figure 2-13), while the other, developed by Alan 
Maxam and Walter Gilbert at Harvard University, 
involves chemicals that degrade DNA (figure 2- 
14) (48,49). The two methods differ in the means 
by which the DNA fragments are produced; they 
are similar in that sets of radioactively labeled DNA 
fragments, all with a common origin but terminat- 
ing in a different nucleotide, are produced in the 
DNA sequencing reactions. 

George Church at Harvard Biological Labora- 
tories has adapted the Maxam and Gilbert DNA 
sequencing meihod in an innovative technology, 
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Figure 2-13.--DNA Sequencing by the Sanger Method 
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called multiplex sequencing, that enables a 
researcher to analyze a large set of cloned DNA 
fragments as a mixture throughout most of the 
DNA sequencing steps. Mixtures of clones are 
operated on in the same way as a single sample 
in traditional sequencing. This is accomplished by 
tagging each DNA clone in the mixture with short, 
unique sequences of DNA in the first step and then 
deciphering the nucleotide sequence of each 



In the Sanger method, a cloned DNA fragment is mixed with 
a short piece of synthetic DNA complementary to only one 
end (the origin) of the cloned fragment. An enzyme called DNA 
polymerase is then used to catalyze the synthesis of a com- 
plementary strand. During the polymerization reaction, a mod- 
ified nucleotide, a dideoxynucleotide, is included with a mix- 
ture of the four naturally occurring nucleotides (A, G. T. and 
C), one of which is labeled with a radioactive phosphorous 
or su If ur atom, causing growth of the DNA chain to stop when- 
ever the modified nucleotide is inserted. Four separate re- 
actions, each containing all four normal nucleotides but a 
different dideoxynucleotide. can be carried out. A series of 
radloactively labeled DNA strands will be made, the lengths 
of which depend on the distance from the origin to the nucleo- 
tide position where the chain was terminated. For example, 
if a short DNA template has four G's, conditions are set up 
such that some molecules will be made with no G dideox- 
ynucleotide analogs, some will terminate at the fourth G po- 
sition, some at the third G position, and so on. Similarly, the 
other three dideoxynucleotides will insert infrequently and 
randomly at the appropriate positions in the other three 
nucleotide-specific reactions. The series of labeled DNA 
ctrands Is subsequently analyzed by polyacrylamide gel elec- 
trophoresis. Radloactively labeled DNA is electrophoresed 
through a vertical slab of polyacrylamide gel (polyacrylamide 
is a polymeric resin in which DNA molecules from 1 to 400 
bases long can be separated from one another), an X-ray film 
is then placed over the gel and exposed, and the resulting 
autoradiograph shows a laddertike pattern of bands. The se- 
quencing reactions corresponding to each of the four differ- 
ent bases are run as four adjacent lanes on the polyacryla- 
mide gel, and the resulting ladders of bands are read 
alternately to give the sequence of the DNA. 

SOURCE Office of Technology Assessment, 1988 



cloned fragment in the final step. Multiplex se- 
quencing allows the simultaneous analysis of about 
40 clones on a single DNA sequencing gel, increas- 
ing the efficiency of the standard procedure by 
more than a factor of 10 (14). Church and co- 
workers have been applying the multiplex se- 
quencing strategy to determine the complete 
nucleotide sequences of two species of bacteria, 
E. coli and Salmonella typhimurium (14). 

The major problem with current DNA sequenc- 
ing technology is the large number of DNA se- 
quences that remains to be determined. Multiplex 
is only one of several new sequencing protocols 
that could be of great value to large genome se- 
quencing projects. Church and Gilbert devised a 
method related to multiplex sequencing that al- 
lows sequencing directly from genomic DNA (15). 
Another method, developed by researchers at Ce- 
tus (Emeryville, CA), involves the selective amplifi- 
cation of specific DNA sequences without prior 
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Figure 2-14.—DNA Sequencing by the Maxam and Gilbert Method 
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In the Maxam and Gilbert procedure, chemical reactions spe- 
cific to each of the four bases are used to modify DNA frag- 
ments at carefully controlled frequencies. One end of one 
strand in a double-stranded DNA fragment is radioactively 
labeled, and the labeled DNA is used in each of four separate 
reactions and treated with a chemical that specifically nicks 
one or two of the four bases in the DNA. When these DNA 
molecules are treated with another chemical, the DNA frag- 
ments are broken where the base was nicked and are de- 
stroyed. Just as in the Sanger sequencing method, the prod- 
ucts of the Maxam and Gilbert sequencing procedure are 
fragments of varying lengths, each ending at the G, C, T, or 
A where the chemical reaction took place. By limiting the 
amount of chemicals used in each of the base-specific re- 
actions so they will react only a few times per molecule. It 
Is possible to obtain all possible double-stranded DNA frag- 
ments equal In length to the distance from the radioactively 
labeled origin to each of the bases. For any given DNA frag- 
ment sequenced, each of thefour reactions is electrophoresed 
separately, as described in figure 2-13, and the sequencing 
patterns determined from the autoradiograph. 

SOURCE' Office of Technology Asssssment, 1988 



cloning (59). Each of these methods could poten- 
tially eliminate the steps of cloning and DNA prep- 
aration in sequence analysis (41). 

Finally, DNA sequencing methods that do not 
involve either gel electrophoresis or chemical or 
enzymatic reactions have also been proposed. At 
the Los Alamos National Laboratory, researchers 
are investigating ways to use enhanced fluores- 
cence detection methods in flow cj^ometry as an 
alternative to gel techniques for DNA sequenc- 
ing. Others have suggested scanning tunneling 
electron microscopes to read bases directly on a 
strand of DNA (57,62). 



AUTOMATION AND ROBOTICS IN MAPPING AND SEQUENCING 



The longest single stretch of DNA sequence de- 
termined to date, the genome of the Epstein-Barr 
virus, contains fewer than 200,000 base pairs . The 
total number of nucleotides sequenced to date 
using both chemical and enzymatic sequencing 
technologies is about 16 million base pairs [Com- 
puter Horizons, Inc., see app. A]. Tliis is the cur- 
rent size of GenBank®, the U.S. repository of DNA 
sequence data [app. D]. Since GenBank® includes 
only reported data, 16 million base pairs repre- 



sents a low estimate of the total number of base 
pairs sequenced. Reported DNA sequences range 
from those of small viruses to those of animals 
and plants (table 2-3). So far, less than one-tenth 
of 1 percent (1.9 million base pairs) of the nearly 
3 bilDon base pairs in the haploid human genome 
has been sequenced and reported (7). The cur- 
rent DNA sequencing rate is estimated to gener- 
ate only about 2 million base pairs per year of 
sequence information (7), a powerful incentive for 
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Table 2-3.— Amount of Genome Sequenced in 
Several Well-Studied Organisms 



Organism 

Escherichia coli (bacterium) . . 
SBCcharomyces cerevisiae 

(yeast) 

Caenorhabditis elegans 

(nematode) 

Drosophila melanogaster 

(fruit fly) 

Mus musculus (mouse) 

Homo sapiens (human) 



Genome size Percent 

(base pairs) sequenced 

4.7 million 16 

15 million 4 

80 million .06 

155 million 26 

3 billion .04 

2.8 billion .08 



SOURCES 

C Burlu, OtnBank*. lo$ Alamos National Laboratory, los Alamos, NM. personal 

communication. March 1988 
0«nBank« Rtltas« No 54, Dacamber 1987 

devising methods of automating the procedures 
involved in preparing for and carrying out DNA 
sequencing. Some recent reviews (16,38,41,47,57) 
provide detailed accounts of the robotic and auto- 
mated systems currently available and describe 
the kinds of systems being developed or planned 
for genome mapping and sequencing. 

Any degree of automation will help lower the 
overall costs of genome projects, both in time and 
in dollars. The primary objective in the use of auto- 
mation is standardization, driven by the need for 
repetitive, highly accurate determinations (41). 
Some of the existing automated devices are de- 
signed for repetitive DNA cloning steps, such as 
the preparation and restriction enzyme cutting 
of cloned DNA samples. Similarly, efforts are be- 
ing made to automate the pouring, loading, and 
running of gels for separating DNA and for se- 
quencing DNA. Many of the steps in physical map- 
ping could be adapted to automation. Cloning pro- 
cedures, DNA probe synthesis, and DNA 
hybridizations are only a few of those being ex- 
plored for application to genome projects. A sys- 
tem that automates some steps in growing DNA 
clones, to be used, for example, as gene probes 
or for DNA sequencing, was recently introduced 
by Perkin-Elmer Cetus Instruments (Norwalk, CT) 
(67). 

The area of automation that has received the 
most attention is DNA sequencing. An interna- 
tional workshop on automation of DNA sequenc- 
ing technologies was held in 1987 in Okayama, 
Japan, and the proceedings give an extensive ac- 



count of the state of the art from an international 
perspective (32). There are five steps in the task 
of DNA sequence analysis: 

1. cloning or otherwise isolating the DNA, 

2. preparing the DNA for sequence analysis, 

3. performing the chemical (Maxam and Gilbert) 
or enzymatic (Sanger) sequencing reactions, 

4. running the sequencing gels, and 

5. reading the DNA sequence from the gel. 

Steps 3 through 5 are the functions most often 
performed by the instruments developed as of 
early 1988 (16,57,70), however, none of the com- 
panies involved has yet commercialized an in- 
tegratec" system that performs all of the functions. 

In 1986, Applied Biosystems, Inc. (Foster City, 
CA) introduced the first commercial, automated 
DNA sequencer (16). This instruinent was made 
using technology developed by L(5roy Hood and 
co-workers at the California Institute of Technol- 
ogy. This and similar machines preform steps 4 
and 5. The Applied Biosystems, Inc. system is base^' 
on the Sanger *^equencing reaction, with mod i- 
cations to use different fluorescent dyes inste '^ 
of radioactive chemicals to label the primers. Be 
cause the sequencing reaction primers are in- 
dividually labeled with different dyes, each of the 
four enzymatic reactions can be run together in 
a single lane on the poly aery lamide gel. A laser 
activates the dyes, and fluorescence detectors read 
the DNA sequence at the bottom of the gel as each 
fragment appears. The sequence is determined 
directly by a computer (figure 2-15). E.I. du Pont 
de Nemours & Co. (Wilmington, DE) introduced 
in 1987 an automated system that slightly modi- 
fies the technology used by Hood and Applied Bi- 
osystems, Inc.; this system can potentially reduce 
the number of artifacts read by the fluorescence 
detectors. Hitachi, Ltd. (Tokyo, Japan) is also ex- 
pected to market an instrument that automates 
steps 4 and 5, but it too is based on the fluores- 
cence technology developed by Hood and col- 
leagues. In early 1988, another U.S. company, 
EE&G Biomolecular (Wellesley, MA), began mar- 
keting a machine that automates the same DNA 
sequencing and gel-reading methods used manu- 
ally in most laboratories. Bio-Rad Laboratories 
(Richmond, CA) marketed an instrument that 
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FIguro 2*15.— Automated DNA Sequencing 
Using Fluorescently Labeled DNA 
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scans autoradiographs of DNA sequencing gels 
and analyzes the data. 

Most of these DNA sequencing systems are 
based on manual enzymatic sequencing reactions, 
while the gel running and reading are automated. 
Only one commercial enterprise, Seiko of Japan, 
has reported automating the chemical or en- 



zymatic steps (step 3) in the DNA sequencing pro- 
tocol (70). In addition, the University of Man- 
chester Institute of Science (Manchester, England) 
has built an automatic redgent manipulating sys- 
tem to carry out the Sanger sequencing reactions 
(47). 

Robotics are used to give automation flexibility, 
to extend its capabilities to complex operations 
typically performed by highly skilled laboratory 
workers. Conceivably, laboratory robots would 
allow programmable devices to do physical work 
as well as to process data (41). Several robotic de- 
vices have been designed and used successfully 
by companies involved in the commercialization 
of recombinant DNA products or processes. Ge- 
netics Institute's (Cambridge, MA) Autoprep® Plas- 
mid Isolation System provides small quantities of 
plasmid DNA and vector DNA for DNA sequenc- 
ing (22). Researchers at the same company also 
c' 'eloped a robot to purify and isolate synthetic 
oligonucleotides for use as probes in cloning and 
DNA sequencing (38). 

Technical advances are occurring rapidly and 
simultaneously in biology, robotics, and computer 
science, so it is difficult to predict what the fu- 
ture will bring in the development of automated 
technology. Someyet-to-be-developed technology 
could make many of the current physical map- 
ping proced'.ires obsolete. 
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"[Physical and geneUc maps ] v'ill certainly be very useful [but] you have to interpret 
that sequence, and that's going to be a lot of work. It will be like having a whole history 
of the world written in a language you cant read." 

Joseph Gall 
American Scientist 76:17-18, 
February 1988 

INTRODUCTION 



Research efforts aimed at creating genetic link- 
age and physical maps of chromosomes or entire 
genomes are collectively referred to in this report 
as genome projects (see chs. 1 and 2). The goals 
of genome projects are to develop technologies 
and tools for mapping and sequencing DNA and 
to complete maps of human and other genomes. 
Proponents expect that the products and proc- 
esses generated from genome projects will enable 
researchers to answer important questions in bi- 
ology and medicine. Meeting this objective, how- 
ever, will depend on the success of concurrent 
projects aimed at analyzing the information gen- 
erated from mapping genomes. Interpreting ge- 
nome maps will require the combined efforts of 
individuals with expertise in structural biology, 
cell biology, population biology, biochemistry, 
genetics, computer science, and other fields. 

Biology and medicine have already benefited 
from efforts to map and sequence specific genes 
from human and other organisms. Some questions 
might be addressed sooner or better, however, 
if more extensive genetic linkage maps, cDNA 
maps, contig maps, and DNA sequences were avail- 



able (figure 3-1). (See ch. 2 for detailed discussion 
of the types of genetic linkage and physical maps.) 
Research on inherited and nongenetic diseases, 
the physiology and development of organisms, the 
molecular basis of evolution, and other fundamen- 
tal problems in biology could all be facilitated in 
the long run by genome projects. 

Scientists continue to debate about which ap- 
plications depend on information from maps of 
entire genomes and which require only maps of 
specific regions. The value of a complete DNA se- 
quence of a reference human genome is the most 
hotly contended scientific issue (see box 3-A). Fo- 
cused research has been the mode of molecular 
genetics to date: Scientists have targeted specific 
regions of genomes for intensive study. Many of 
the potential applications of genome mapping sum- 
marized in this chapter have already been and 
will continue to be achieved by targeted research 
projects. Wherever possible, therefore, this chap- 
ter attempts to differentiate between the uses for 
which extensive maps will be necessary and those 
for which partial maps are adequate. 
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Figure 3*1.— -Mapping at Different Levels of Resolution 




MAPPING GENES TO WHOLE CHROMOSOMES 

Chromosome Banding 

DNA Hybridization to Somatic Cell Hybrids 

or Sorted Chromosomes 
In Situ Hybridization 
Karyotyping 



GENETIC LINKAGE MAPPING 



PHYSICAL MAPPING OF LARGE DNA FRAGMENTS 

YBBSt Artificial Chromosome Cloning 

(Up to 1 Million Base Pairs) 
Pulsed Field Gel Electrophoresis 

(Up to 10 Million Base Pairs) 



PHYSICAL MAPPING OF SMALL DNA FRAGMENTS 

Cosmid Cloning (Up to 45,000 Base Pairs) 
Plasmid and Phage Cloning 
(Up to 23,000 Base Pairs) 



APPLICATIONS IN MEDICINE 



Genome projects have accelerated the produc- 
tion of new technologies, research tools, and basic 
knowledge. At current or j)erhaps increased levels 
of effort, they may eventually make possible con- 
trol of many human diseases— first through more 
effective methods of detecting disease, then, in 
some cases, through development of effective ther- 
apies based on improved understanding of dis- 
ease mechanisms. Advances in human genetics 
and molecular biology have already provided in- 
sight into the origins of such diseases as hemo- 
philia, sickle cell disease, and hypercholester- 
olemia. 

The new technologies for genetics research will 
also hetp in the assessment cf public health needs. 
Techniques for sequencing DNA rapidly, for ex- 
ample, should permit the detection of mutations 
following exposure to radiation or environmental 



agents. Susceptibilities to environmental and work 
place toxins might be identified as more detailed 
genetic linkage maps are developed, and special 
methods of sun^eillance can be used to monitor 
individuals at risk. By providing tools for deter- 
mining the presence or absence of pathogens (e.g., 
bacteria and viruses) in large numbers of individ- 
uals as well as identifying genetic factors that ren- 
der some human beings more susceptible to in- 
fection than others, genome projects might also 
yield methods for tracking epidemics through pop- 
ulations. 

Developing Diagnostic Tools 

The use of DNA hybridization probes for de« 
tecting changes, such as restriction fragment 
length polymorphisms (RFLPs), in the DNA of in- 
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Box 3-A.— Why Sequence Entire Genomes? 

Rapid advances in technology have made it feasible to sequence the entire genome of an organism, at 
lerst a small one such as bacteria or yeast. Researchers do not yet agree, however, on the value of a com- 
plete DNA sequence of a genome the size of the human genome. Several types of arguments have been 
made in favor of sequencing entire genomes: 

• The information in a genome is the fundamental description of a living system-it is what the cell 
uses to construct a copy of itself-and so is of fundamental concern to biologists. 

• Genome sequences provide a conceptual framework within which much future research in biology 
will be structured. Questions concerning control of gene expression (signals for control of gene expres- 
sion, genome replication, development mechanisms, and so on) ultimately depend on knowini? Renome 
sequences. ° ° 

• The genomes of some higher organisms, including those of human beings, have repeated DNA se- 
quences, sequences of unknown function, and some sequences which are nkely to have no function, 
comprising nearly 90 percent of the total DNA content. Without the complete DNA sequence of several 
genomes, it will be impossible to determine whether such sequences have meaning or are ancestral 
"junk" sequences. 

• Genome sequences are important for addressing questions concerning evolutionary biology. The recon- 
struction of the history of Uf e on this planet, the definition of gene families (also critical to other areas 
of biology), and the search for a universal ancestor all require an understanding of the organization 
of genomes. 

• Genomes are natural information storage and processing sytems; unraveling them may be of general 
interest to computer and physical scientists. 

Other scientists would argue that these possible applications can be derived from sequences of single 
genes or larger regions of chromosomes. They believe it is a waste of time and money to sequence the 
entire human genome, particularly because some regions have no known or essential function Many of 
these researchers favor sequencing only those regions believed to be clinically or scientificiaUy important, 
including expressed sequences and sequences involved in the control of gene expression, and putting the 
others off indefinitely. 

SOURCES 

National Kesearch Council, Mapping and Sequencing the Human Genome (U'ashington. DC National Academy Press, 19881 
L Woese. University of Illinois, Urbana. personal communication, June 1987 



Table 3-1.— Examples of Single-Gene Diseases 



Disease 


Description 


Genetic marker 
identified 


Gene 
cloned 


Protein 
identified 


Duchenne muscular dystrophy 


Progressive muscle deterioration 


Yes 


Yes 


Yes 


Cystic fibrosis 


Lung and gastrointestinal degeneration 


Yes 


No 


No 


Huntington's disease 


Late-onset disorder with progressive physical 


Yes 


No 


No 




and mental deterioration 








Sickle cell anemia 


Deformed red blood cells block blood flow 


Yes 


Yes 


Yes 


Hemophilia 


Defect in clotting factor VIII causes uncontrolled 


Yes 


Yes 


Yes 




bleedir^g 




BetaThalassemla 


Failure to produce sufficient hemoglobin 


Yes 


Yes 


Yes 


Chronic granulomatous disease 


Frequent bacterial and fungal infections 


Yes 


Yes 


Yes- 


Phenylketonuria 


involving lungs, liver, and other organs 






tentative 


Enzyme deficiency that causes brain damage 


Yes 


Yes 


Yes 




and mental retardation 






Polycystic kidney disease 


Pain, hypertension, kidney failure m half of 


Yes 


No 


No 




victims 




Retinoblastoma 


Cancer of the eye 


Yes 


Yes 


Yes 


SOURCE. Offlc« of Technology Assessment. 1988 



ERIC 



58 




Photo emdit: flay WNt; TTw UnfvtrHty of Utah Madfctt Cantv, Satt Lake City, 
UT. fhprtntmt with ptmiaaion from American Journal of Human Genetics 39-300- 

306, 1906 

Identification of a genetic marker showing linkage be- 
tween high levels of low-density lipoprotein (LDL) cho- 
lesterol and the genetic locus for the LDL receptor gene 
using restriction fragment length polymorphism (RFLP) 
analysis of the LDL receptor genes from a multigener- 
ational family with Inherited hypercholesterolemia. A 
radioactively labeled DNA fragment from the cloned 
LDL receptor gene was used as a probe to observe 
differences among affected and unaffected individuals 
in the numbers of electro phoreticaliy separated DNA 
fragments after cutting the DNA with a restriction en- 
zyme. Individuals without the polymorphism are rep* 
resented as unfilled squares (males) orclrcles (females) 
and show only one DNA fragment. Half>filled symbols 
represent Individuals with one allele for the defective 
gene and one for the normal gene and show two DNA 
fragments. The lane marfted "M" is a set of 
DNA fragment size markers. 



dividuals with genetic diseases is described in de- 
taU in chapter 2. Such methods of DNA analysis 
offer several advantages over traditional ap- 
proaches to the study of human disease. Know- 
ing the organization of genes on chromosomes 
and their DNA sequences could enable clinicians 
to detect mutant genes before a disease manifests 
itself in the form of damaged cells or tissues and 
will eventually lead to a more complete under- 
standing of the pathogenesis of human disease 
[Priedmann, see app. A]. 

The study of randomly selected RFLP markers 
in human families has revealed linkages to a num- 
ber of genetic diseases (table 3-1) (1,3,6, 10,16,17, 
23;25;29,32;37,38,41,42,46,52,55,56). As the chro- 
mosomal locations of more disease-causing genes 
are identified, more probes for diagnosing genetic 
diseases will become available. A genetic linkage 
map saturated with RFLP markers (or one 
with other polymorphic markers) is viewed by 
many molecular geneticists as crucial to the 



development of diagnostic reagents for the re- 
maining human genetic diseases [Friedmann, 
see app. A]. (See table 3-2 for a list of companies 
developing diagnostic probes for such diseases.) 

is important to recognize that DNA probes 
for RFLP markers are not always reliable tools 
for diagnosing genetic diseases before the onset 
of symptoms. Without enough data from relatives 
of potential disease carriers, it may not be possi- 
ble to confirm the linkage between a particular 
RFLP marker and a genetic disease. The main limi- 
tation to reliable diagnosis of most genetic diseases 
is the lack of an adequate number of DNA sam- 
ples from several generations of affected and un- 
affected individuals. 

Many available RFLP markers can be used only 
in a few families, and the RFLP marker map is 
a cumulative one that aggregates the data from 
many families. The largest standard data set is 
derived from the Center for the Study of Human 
Polymorphism (CEPH) in Paris (see ch. 7 on inter- 
national efforts in genome mapping). T^ e data col- 
lected by CEPH are taken from 40 families around 
the world, most of which do not have any known 
genetic disease. Materials from these families are 
used to locate RFLP and other polymorphic mar- 
kers. Once markers have been identified, they can 
be tested for linkage to a particular genetic dis- 
ease in families known to have that disease. The 



Table 3*2.— Some Companies Developing DNA Probes 
for Diagnosis of Genetic Diseases 



Company 



Probes under development 



California Biotechnology 
(Mountain View, CA) 

Cetus Corporation 
(Emeryville, CA) 

Collaborative Research 
(Bedford. MA) 



Integrated Genetics 
(Framingham, MA) 



Lifecodes 
(Elmsford, NY) 



Susceptibility to heart 

disease 
sickle cell anemia 

Cystic fibrosis 
Ducheime muscular 

uystrophy 
Polycystic kidney disease 

Cystic fibrosis 
Hemophilia B 
Huntington's disease 
Polycystic kidney disease 
Sickle cell anemia 

Cystic fibrosis 
Down's syndrome 
Polycystic kidney disease 
Sickle cell anemia 



SOURCE Office of Technology Assessment, 1988 
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Photo credit' The Bettmann Archive, New York, NY 

A large New England family of the early 19008 spanning three generations. Samples of genomic DNA from members 
of such families are very useful for constructing genetic linkage maps, such as a RFLP map. 



CEPH families are large, selected to enable scien- 
tists I trace DNA markers through at least three 
generationr. 

Isolating Genes Associated Witli 
Disease 

Some inherited human diseases arise from or 
cause differences in detectable proteins that cir- 
culate in the blood, such as human growth hor- 
mone and insulin. A research scheme called for- 
ward genetics has been used to isolate the genes 
encoding these proteins. In this strategy, a gene 
is cloned after the altered protein produrt has 
been characterized. Other genetic diseases, such 
as retinoblastoma, chronic granulomatous disease, 
and Duchenne muscular dystrophy, involve pro- 
tein products that were not identified before the 
corresponding gene was cloned. An experimental 
approach called reverse genetics was used to find 
these genes. First the gene containing the muta- 
tion responsible for the disease is linked to a RFLP 
or other polymorphic marker, then the gene and 



its protein product are isolated and characterized 
[Friedmann, see app A]. 

Forward Genetics 

Until recently, most methods for cloning disease- 
associated ^^enes required prior characterization 
of the biochemical defect responsible for the dis- 
ease. Using the forward genetics approach, re- 
searchers identify the mutant gene product— a 
protein— then isolate a clone of the gene from a 
library of cDNA clones (clones made from DNA 
copies of the mRNA transcripts of genes— see ch. 
2). If the protein has been pui ified, antibodies can 
be made and used to select for clones of cells ex- 
pressing the product. Alternatively, if part of the 
protein's amino acid sequence is known, synthetic 
DNA probes complementary to the exons, or 
protein-coding sequences, of the gene can be de- 
signed, based on the genetic code (figure 3-2). Once 
the cDNA clones are isolated, they can be used 
as DNA probes to pick out clones from genomic 
libraries. 
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Figure 3-2.— The Use of Synthetic DNA Probes To Clone Genes 
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The difference between cDNA copies of genes 
and genes on chromosomes is that the latter have 
both exons and introns (noncoding sequences in- 
terrupting protein-coding sequences). The genes 
in the human genome range from fewer than 
1,000 base pairs to more than 2 million base pairs 
in size and are thus typically too large to be con- 
tained on standard cloning vectors (table 3-3). The 
cDNA clones, which are smaller because they con- 
tain only exons, are useful because they can be 
introduced into bacteria, yeast, or mammalian tis- 
sue culture cells and transcribed and translated 
into protein. The resulting proteins can be used 
in studies of the physiology of diseases or in some 
cases as human therapeutics. 

The utility of the various types of physical 
maps in the forward genetics strategy depends 
on the purpose of isolating the gene. If only 
cDNA copies of a particular gene are needed for 
making large quantities of the protein product, 
then extensive genomic maps would not be nec- 
essary. If the cDNA copy of the gene is to be used 
as a DNA probe for isolating the whole gene from 



a collection of genomic DNA clones, or for study- 
ing the organization of the genome in the region 
of interest, then a contig map illustrating the or- 
der of DNA segments from the relevant portion 
of the genome would be very useful. 



Reverse Genetics 

Reverse genetics has made it possible to isolate 
genes associated with inherited diseases for which 
no specific biochemical defect has been estab- 
lished. To do this, the genetic disease is usually 
Mnked first to a particular chromosome by study- 
ing inheritance patterns at the DNA level. The gen- 
eral region of the gene on the chromosome is iden- 
tified using DNA probes for RFLP markers on that 
chromosome. Samples of DNA from families of 
individuals afflicted with the genetic disease are 
tested with a set of DNA probes which hybridize 
to markers spaced throughout the chromosome 
until a linkage between the mutant gene that 
causes the disease and the RFLP is detected. The 
location of the gene is then identified more pre- 



Table 3-3.-The Size of Human Genes 



Gene 
Small: 

Alpha-globin .... 

Beta-globin 

insulin 

Apoliproprotein E 

Parathyroid 

Protein kinase C . 

Medium: 
Collagen I 



Gene size 
(in thousands 
of nucleotides) 



mRNA size 
(in thousandc 
of nucleotides 



Number of 
introns 



0.8 
1.5 
1.7 
36 
42 
11.0 



0.5 
06 
0.4 
1.2 
1 0 
1.4 



Pro-alpha-l(l) 


18.0 


5.0 


Pro-alpha-2(l) 


38.0 


50 


Albumin 


25.0 


21 


High-moLiiity group 




CoA reductase 


25.0 


4.2 


Adenosine deaminase 


32.0 


1.5 


Factor IX 


34.0 


28 


Catalase 


34.0 


1 6 


Low-den sity-lipo protein 




receptor 


45.0 


55 


Large: 






Phenylalanine hydroxylase . . . 


90.0 


24 


Factor VII 


186 0 


9.0 


Thyroglobulin 


300 0 


87 


Very large: 






Duchenne muscular dystrophy 


2,000.0 


17.0 


SOURCE Victor McKusIck, The Johns Hopkins University, Baltimore, MO 



2 
2 
2 
3 
2 
7 



50 
50 
14 

19 
11 
7 
12 

17 

12 
25 
36 

50 
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cisely by using additional probes for closely spaced 
markers that cover the region of interest. 

In order to distinguish the gene locus that actu- 
ally causes the disease from nearby but unrelated, 
genes, it is generally necessary to demonstrate 
that the identified gene is expressed abnormally 
in tissues from patients with the disease. A 
genomic region 1 million base pairs in length, for 
example, could contain as many as 100 genes. In 
such cases, it is necessary to use biochemical meth- 
ods to identify the gene tliat is responsible for the 
disease. Techniques for detecting messenger RNA 
transcripts or proteins can be used to search for 
differences in amounts of gene product in the tis- 
sues of affected and unaffected individuals; these 
differences can then be correlated with an alter- 
ation in a particular gene. Retinal cells were ana- 
lyzed in this way as part of the search for the 
retinoblastoma gene (18,28), as were muscle cells 
in individuals with and without Duchenne mus- 
cular dystrophy (see box 3-B). Once the gene prod- 
uct has been identified, it is possible to study the 
physiology of a particular disease with the aim 
of identifying a therapy or preventive treatment. 

Although reverse genetics is generic in concept, 
the amount of effort involved in isolating and char- 
acterizing genes using genetic and physical map 
data varies. Over 100 person-years have been 
spent searching for the gene that causes cystic 
fibrosis— an effort that has led to localizing the 
gene on a small region of chromosome 7 but not 
to finding the gene itself or determining how it 
causes the disease (17). On the other hand, re- 
searchers identified and isolated the gene for 
chronic granulomatous disease in far fewer 
person-years (38). The existence of DNA probes 
for RFLP markers has also made possible the iden- 
tification of the genes for Duchenne muscular dys- 
trophy (32) and retinoblastoma (18,28) (Friedmann, 
see app. A]. 

The technical diffic ulty involved in locating the 
gene responsible for i particular disease by re- 
verse genetics usually depends on the physical 
map distance between the nearest RFLP marker 
and the linked gene. Existing RFLP mans of the 
human genome have a resolution of only about 
10 centimorgans (approximately 10 million base 
pairs). A map with markers spaced every 1 



centimorgan would make it much less 
time-consuming to locate the genes by reverse 
genetics (7,33). Such a map would be constructed 
using a pool of several thousand DNA probes that 
detect RFLP markers spaced about every 1 mil- 
lion base pairs throughout the genome. A library 
of clones made from overlapping segments of 
the genome and a contig map illustrating the 
relative position of each clone with respect to 
its neighbors would also be useful in reverse 
genetics. These tools would spare researchers 
the labor-intensive step of isolating and charac- 
terizing all of the genomic clones between the 
marker and the gene of interest; the only work 
remaining would be to associate the characteris- 
tics of the disease with the correct clone or clones. 

Identification of Genes Involved in 
Polygenic Disorders 

C^netic linkage maps of the human genome are 
also useful for characterizing inherited d;sea.-es 
caused by more than one factor, often referred 
to as polygenic disorders. Among the diseases for 
which more than one gene is likely to be respon- 
sible are certain cancers, diabetes, and coronary 
heart disease (27). For example, in a complex dis- 
order such as coronary heart disease, blood 
plasma lipoproteins, the coagulation system, and 
elements of the arterial walls all play a role, so 
the number of genes involved can be very large 
(40). Some scientists argue that the RFLP maps 
currently avaUable, with markers spaced an aver- 
age of 10 centimorgans apart, are a sufficient start- 
ing point for studies of polygenic diseases (11). 
Higher^resolution RFLP maps; such as a !• 
centimorgan map; would no doubt simplify 
the job of identifying the genes responsible for 
pol^ ^ -"nic disorders. 

Developing Human Therapeutics 

Forward genetics has yielded important results 
in the area of drug development. As stated earlier, 
the ability to use cDNA clones has been crucial 
to the development of commercial products such 
as human growth hormone and insulin and to po- 
tential human therapeutics such as tumor necro- 
sis factor and interleukin-2, therapeutics that 
would not otherwise be available in the quantity 
or quality necessary for effective use (table 3-4) 
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Box 3-B.— Duchennf? and Becker's Muscular Dystrophies 

Duchenne muscular dystrophy (DMD) is a genetic disease that affects 1 in 3,000 to 1 in 3,500 male 
infants born. Becker's muscular dystrophy is a similar but milder disorder with much lower incidence. 
Both diseases begin in childhood and lead to muscle wasting. DMD typically results in death before age 
20. The search for the gene causing these diseases and the protein encoded by that gene has been an excit- 
ing story of molecular biology in ihe 1980s. The effort in many ways typifies modern genetics, with exten- 
sive international collaboration, study of nonhuman species, and creative use of molecular methods. 

The gene causing these diseases had been known for some time to be on the X chromosome because 
of inheritance patterns. Duchenne and Becker's muscular dystrophies affect primarily boys, wh^ have only 
one X chromosome, inherited from their mothers. Girls have two X chromosomes and therefore must, 
as a rule, receive abormal genes from both parents in order to develop Duchenne or Becker's muscular 
dystrophy— a much less likely occurrence. 

The search for the gene started with studies of families. DNA from persons with DMD, including several 
girls and one boy, was collected in an effort to find a common area of the X chromosome that had been 
lost or altered. Once the correct region of the X chromosome had been identified (its absence was found 
to cause DMD), DNA from that region was obtained and cloned. The clones were used as DWA probes 
for complementary mRNA sequences in muscle tissue from affected and unaffected individuals. The pur- 
pose was to identify the mRNA gene transcript that was present in unaffected individuals but altered in 
persons with DMD. The mRNA was located and subsequently shown to encode a large protein called dystro- 
phin found in muscle cells. 

The DMD search has uncovered some extraordinary facts. Duchenne and Becker's muscular dv^ rophies 
are caused by different changes in the same gene. That gene is the largest found to date, spanning over 
2 million base pairs (table 3-3). It is broken into at least 60 exons. 

The scientific collaboration that led to the discovery of dystrophin was notably efficient. One paper 
alone listed 77 authors from 24 research institutions in 8 countries. Molecular probes, clones, and materials 
from affected patients were openly exchanged, hastening researchers in their quest for the culprit gene. 

SOURCES 

^ "J."?*"**^.^' ^ ^ ^ Tirschwell. et al , 'Recombination With pERT87 {DXS164) in Families Wilh X-Linked Muscular F 'Strophy/' Lancet 

ZwUlyJ iCM« 1986 

E P Hoffman. A P Monaco. C C Feenei. et al , Conservation of the Duchenne Muscular Dystrophy Gene in Mice and Humam" Science 238 347-350, 1987 
t P Hoirman, R H Brown, and L M Kunkel. "Dystrophin The Protein Product of the Duchenne Muscular Dystrophy Locus/ Cell 51 919-928, 1987 
M Koenig E P Hoffman. C J Sertelson, et al . Complete Cloning of the Duchenne Muscular Dystrophy (DMDI cDNA and Preliminary Genomic Organiza- 
tion of the DMD Gene in Normal and Affected Individuals," Cell 50 509-517. 1987 . o 
L M Kunkel et al , "Analysis of Deletions From Patients With Becker and Duchenne Muscular Dystrophy, " Nature 322 73-77. 1986 
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X Chromosome, "Ce// 47 499-'J04, 1986 ' p / 



(53) The cDNA clones isolated by forward genetics 
could be used to make a cDNA map that illustrates 
the chromosomal locations of expressed regions 
of DNA. This cDNA map; plus a Ubrary of pre- 
viously ordered clones of genomic mA, would 
be valuable tools for studying the role of cer- 
tain genes in the manifestation of disease. 
Knowledge of the mechanisms directing normal 
cellular functions will probably lead to important 
sources of new ther-nies for human diseases: nat- 



ural human proteins made from isolated human 
genes, engineered proteins, and conventionally 
synthesized drugs designed from a knowledge of 
the structure of the proteins they target. Ad- 
vances in the development of human therapeu- 
tic products will be made more rapidly if re- 
search in the areas of protein engineerings the 
relationship of protein structure to function; 
rational drug design, and others parallels ge- 
nome mapping efforts. 
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Pho^ertdltHmeyWmltr, ColumbfM Unlmwtty, N9w York, NY 



A Venezuelan man with Huntington's disease, a rare, 
lateonr^et genetic disease that causes degeneration 
of nerve cells In the brain. 

Prospects for Human Gene Therapy 

Clinical use of human genetic Iin» phys- 
ical mapS; now largely restricted to diagnosis, may 
eventually include the insertion of no? mal DNA 
directly into human cells to correct a particular 
genetic defect (54). This practice is called human 
gene therapy. Advances in gene therp.py will de- 
pend on development of ways to insert DNA into 
cells safely and to ensure that the inserted DNA 
corrects the defect (54). Gene mapping will not 
improve gene therapy directly, and for most dis- 
eases the ability to make a diagnosis will precede 
the availability of an effective treatment. The 
knowledge gained through use of gene maps will, 
however, enhance knowiedge about the function 
of genes and thus indirectly improve the prospects 
for gene therapy (Friedmann, see app. A). 



Table 3-4.— Some Human Gene Products With 
Potential as Therapeutic Agents 



Qene product 

• Actual or potential therapeutic application 

Atrial Natluretic Factor 

• Possible applications in treatment of hypertension 
and other blood pressure disorders, and for some 
kidney diseases affecting excretion of salts and 
water. 

Alpha Interferon* 

• Approved for treatment of hairy cell leukemia; 
possible broader applications in other cancers. 

Beta Interferon 

• Inhibits viral infections and may be useful as an 
anticancer treatment. 

Epidermal Growth Factor 

• Expected to have applications in wound healing, 
Including burns, and for cataract surgery. 

Erythropoietin 

• Anticipated treatment use for anemia resulting 
from chronic kidney disease. 

Factor VllhC^ 

• Prevents bleeding in patients with hemophilia A 
after Injury. 

Fibroblast Growth Factor 

• Possible use in wound healing and treating burns. 
Gamma Interferon 

• Possible treatment for scleroderma and arthritis. 
Granulocyte Colony Stimulating Factor 

• Possible treatment for Acquired Immune Deficiency 
Syndrome (AIDS) and leukemia. 

Human Growth Hormone* 

• Approved as a treatment for childhood dwarfism; 
expected to havn broader therapeutic potential in 
treatment for short stature resulting from Turner's 
syndrome and for wOFjnd healing. 

Insulin* 

• Approved for treatment of diabetes. 
lnterieukln-2 

• Possible treatment for various cancers. 
Macrophage Colony Stimulating Factor 

• Potential applications are for treatment of 
infectious diseases, primarily parasites; possible 
cancer therapy. 

Superoxide Dismutase 

• Possible preventive treatment for damage caused 
by oxygen-rich blood entry into oxygen-deprived 
tissues (e.g., during organ transplants). 

Tissue Plasminogen Activator 

• Approved as treatment for dissolving blood clots 
associated with heart attacks. 

Tumor Necrosis Factor 

• Possible anti-tumor therapy. 

^Approved for commercial sale In the United States by the Food and Drug 
Administration 

^on-recombmant DNA version has been approved for sale, but the cloned gene 
product has not 
SOURCE Office of Technology Assessment, 1988 
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APPLICATIONS IN HUMAN PHYSIOLOGY AND DEVELOPMENT 



Studies aimed at understanding the molecular 
basis of inherited diseases may yield information 
that can be generalized to other physiological proc- 
esses. Knowledge of the structure and function 
of genes associated with Alzheimer's disease, for 
example, might give important clues to the cellu- 
lar mechanisms regul-^ting aging of brain tissue. 

The organization of genes in genomes is another 
fundamental issue in biology. Is it important for 
genes to exist on a particular chromosome in a 
particular order? Comparisons of physical maps 
of the chromosomes of higher organisms could 
shed some light on the extent to which gene organ- 
ization is associated with gene expression and gene 
function. 

The nucleotide sequences of human genes have 
been and uoll continue to be important research 
tools for understanding the basic cellular proc- 
esses underlying physiology and development. 
Nevertheless, knowing the DNA sequences of 
genes and how they translate into the amino acid 
sequences of protein products is not sufficient to 
establish how such genes are controlled or how 
the gene products function in a particular cell or 
in the organism as a wholp. The genetic code that 
guides the translation of I NA sequence into pro- 
tein sequence offers only the first step in unravel- 
ing the mysteries of the human genome. Under- 
standing the relationship between protein 
structure and function is the crucial next step, 
but it faces the greatest number of technical bot- 
tlenecks (sse box 3-C). 

Identification of Protein-Coding 
Sequences 

IndHidual efforts to clone particular genes 
will not be eliminated by the availability of 
genetic linkage and physical maps; rathei) they 
will be redirected toward localizing a particu- 
lar gene within a region of a chromosome or 
within the DNA sequence of that region. Be- 
cause human genes are more often interrupted 
by introns, the identification of the exon and reg- 
ulatory sequences in and around g0nes has proved 
more difficult in human beings than in lower 
organisms. The most reliable method of identify- 



ing exons is to know the amino acid sequence of 
the protein product and, using the genetic code, 
find the corresponding DNA sequence by inspect- 
ing the whole gene sequence. DNA sequences can 
be determined at a faster rate than proteins can 
be isolated and sequenced, however, so computer- 
assisted methods offer a more practical approach. 

There is a variety of computer software avafl- 
able for predicting exon sequences, some of it 
more reliable than others (12,14,48) [Mount, see 
app. A]. As more DNA sequences become avaU- 
able, methods for predicting exons can continue 
to be refined. Computer scientists argue that 
the analysis phase of whole genome sequence 
iilg projects will progress efficiendy only If the 
develr- dent of new computational and other 
theor^ oased predictive methods that can ac- 
commodate large sequences is emphasized. 




Photo cndlt. Shlrtay Wghman, Princeton Unfvnity, Princeton. NJ 



Electron micrograph revealing an Intron sequence in- 
terrupting the protein-coding sequences of the mouse 
beta-globin gene. DNA containing the gene (including 
intron sequences) was allowed to hybridize (base pair) 
with beta-globin mRNA that had been isolated from 
cells in its mature form with no intron sequences. A 
loop appears in the region of the Intron where no com- 
plementary sequences exist between the two 
molecules (see arrow). 
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Box 3-G— From Gene Structure to Protein Structure: The Protein-Folding Problem 

"Protein-folding is the genetic code expressed in three dimensions," according to Fetrow and co-workers. 
How does the linear sequence of amino acids code for a protein's structure? How does the three-dimensional 
conformation of a protein drive its function? Sometimes the amino acid sequence of a protein with an 
unknown function is similar to that of a protein with a known function; in many such cases, the similarity 
is a valid indicator of comparable jobs. In other cases, the three-dimensional structure of a protein (the 
amino acid sequence folded into the actual structure of the protein) gives more reliable clues about func- 
tion. It is th ^refore important to develop exj)erimental and theoretical means for determining the three- 
dimensional structures of proteins. Because proteins are so large, often consisting of multiple domains (dis- 
crete portions) v^th different functions, this generally involves analysis of how each part of a protein con- 
tributes to its overall structure. 

There is experimental evidence that certain structural domains can serve similar functions in a number 
of different proteins. It is the combination of domains that gives a protein its unique overall function. A 
stretch of amino acids in one protein can be nearly identical in sequence to that in another protein, but 
if the surrounding amino acid sequences are different, then the sequences might fold into domains with 
quite different three-dimensional structures. At present, scientists cannot predict with certainty how the 
linear sequence of amino acids in a protein will fold into the protein's three-dimensional structure— thus 
the protein-folding problem. As genome mapping projects make more gene sequences available, the prob- 
lem will take on even greater significance. The National Academy of Sciences in a recent report called 
protein folding "the most fundamental problem at the chemistry-biology Interface, and its solution has the 
highest long-range priority." 

Most predictions of three-dimensional structure are based on theories of the behavior of amino acids 
in certain chemical and physical environments and on information gleaned from viewing the atomic struc- 
tures of proteins through X-ray diffraction. (X-ray diffraction of protein crystals is an important tool in 
structural biology— the field dedicated to the study of proteins and other macromolecular structures. It 
is the most important technique for determining the three-dimensional structure; of large proteins at the 
atomic level.) Existing methods for predicting structure are not reliable for all proteins or protein domains, 
because structural data are available for only about 200 proteins and for even fewer classes of proteins. 
There are few membrane proteins in the structure dat '^ase, for example, and thus little experimental 
basis for testing predictions about how such important ^ oins will fold. More structures of proteins need 
to be determined, using X-ray crystallographic and other biophysical technologies, in order to provide a 
solid foundation for protein-folding theories. Once the protein-tolding problem is solved, the road from 
gene sequence to gene function will be considerably shortened and will lead, in some cases, toward the 
development of promising new human therapeutic products. 

SOURCES 

T BluiideU. B L Sibanda, M J E Sternberg, and i M Thornton, "Knowledge-Based Prediction of Protein Structures and Ihe Design of Novel Molecules," 
Nature 326.347-352, 1987 

J5 Fetrow, M H Zehfus, and GD Rose, "Protein Folding New Twists." Bioriechnoiogy 6 167171, 1988 

T Koetzle, Brookhaven National Laboratory, Upton, NY, personal communication, March 1987 

National Academy of Sciences, Re-'^nrch Briefings (Washington, DC 1986, National Academy Press, 1986) 



Approaches to Understanding 
Gene Function 

Isolating a gene is not nearly as difficult as de- 
termining how the gene and its products func- 
tion in the cell. The following are some experi- 
mental approaches to solving this problem: 



• to modify or inactivate the normal function 
of a gene by replacing it with a modified 
version, 

• to inhibit the function of a gene's mRNA or 
protein product by using antibodies to the 
protein or an RNA complementary to the 
mRNA, and 



ERiC 



67 



• to compare the DNA sequence of a cloned 
gene of unknown function with those of genes 
whose functions are known. 

Using me first two methods, scientists have stud- 
ied the function of gene products by identifying 
alterations in the biochemical or physical charac- 
teristics of the affected cell or organism (2,15, 
24,26,30,36,39,44,50,51). The third strategy is the- 
oretical, using sequence data accumulated from 
previously characterized gene products to pre- 
dict a function for a newly identified gene. Such 
predictions can then be tested experimentally. 

Probably the most widely used first step in de- 
termining tlie role of a gene ib to find similarities 
between its DNA sequence and those of genes 
from other organisms. Yeast, for example, shares 
with animal cells many of the molecules and proc- 
esses that are being studied intensively in mod- 
em cell biology, including the factors modulating 
cell structure and dynamics, the components of 
the machinery that modulates protein secretion 
from cells, the constituents of basic chemical path- 
ways, and analogs of several man^malian on- 
cogenes (genes involved in controlling the rate of 
cell growth). Many of these factors and processes 
were first identified or characterized, or both, in 
higher organisms, but the application of them to 
yeast genetics has provided new insights (57). 

A recent study reported the use of yeast cells 
to isolate a human gene that can substitute for 
a yeast gene in regulating the yeast cell's life cy- 
cle (34). Plasmid vectors carrying cDNA were in- 
troduced into yeast cells to find a human gene 
product that was similar enough to a yeast gene 
to replace it in the regulation of the yeast cell's 
life cycle. This was accomplished by mutating the 
yeast's copy of the gene and then finding cells that, 
upon introduction of the appropriate human gene, 
appeared to regain their noi mal function. The hu- 
man cDNA clone identified by this techniq^^e is 
thus a candidate for a protein that regulates life 
cycles in human cells. Studies of the genomic ver- 
sion of the human gene will be necessary to defini- 
tively establish the role of tnis product in human 
cell cycle regulation . This example illustrates how 
yeast genetics and biochemistry can be used to 




Photo cmdiV DonMid m<kH9, Univonity of Mfa$ourt, Columbia, MO 

Micrograph com paring the appearance of a short, fat 
nematode mutant called "dumpy" (above) with that of 
a normal nematode (below). The mutant grows to only 
two*thirds of the normal body length because of a mu- 
tation inagene foratype of collagen (a protein) needed 
for normal development (magnified 80 times). 

identify human genes with important functions 

(57). 

In fruit flies, genetic research and the tools of 
recombinant DNA have made it clear that certain 
DNA sequences are involved in regulating the de- 
velopment of the oiiganism. Different sets of genes 
appear to be expressed at different times in the 
course of development, causing ne patterns ob- 
served in the developing embryo. How gene ex- 
pression is regulated to create developmental pat- 
terns is a central question in biological studies of 
many organisms. Fruit flies are easy to dissect and 
to manipulate genetically, and much is known 
about their development; they have therefore 
proven to be a very useful model. One DNA se- 
quence, called the homeo box, was first identi- 
fied m the genes of fruit flies and later in those 
of higher organisms, including human beings (31). 
There is substantial evidence that the homeo box, 
a short stretch of nucleotides of nearly identical 
sequence in the genes that contain it, determines 
when the expression of particular groups of genes 
is turned on and off during development of the 
fruit fly (35). As more gene sequences from fruit 
flies, human beings, and other organisms are de- 
termined, more knowledge about the signals 
governing developmentally expressed genes is 
likely to be acquired [Mount, see app. Al. 
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Mutations in the gene Antennapedia, a homeotic gene, cause the fruit fly to develop an extra pair of antennas. Picture J 
at left is the normal fruit fly, and at right a fly vvith the mutation. Homeotic genes have counterparts in humans and 
vertebrates; each gene has a characteristic DNA sequence within its protein-coding sequences called the homeo box. 



APPLICATIONS IN MOLECULAR EVOLUTION 



The disciplines of population biology, genetics, 
molecular biology, and cellular biology merge in 
the study of how species evolve, constituting the 
field of molecular evolution. The construction of 
a physical map of the human genome will permit 
molecular analysis of several questions fundamen- 
tal to evolution, including how genomes change 
and what factors cause them to change, as well 
as how small-scale changes relate to the overall 
evolution of the organism (45). 

Species with different degrees of relatedness 
can be usefully compared because their genes, 
and thus the proteins encoded by those genes, 
will have differing rates of sequence divergence. 
The course of human evolution can be read in 
the sequences of proteins (14). Comparisons of 
human and mouse DNA sequences are probably 
the most useful in the identification of genes 



unique to higher organisms because mice genes 
are more homologous to human genes than are 
the genes of any other well-characterized organ- 
ism. Comparisons of human DNA sequences with 
those of lower organisms such as the fruit fly or 
nematode are most useful in the identification of 
genes encoding proteins that are essential to all 
multicellular organisms. Finally, since yeasts are 
single-celled eukaryotes (cells whose chromo- 
somes are contained in nuclei), their sequences 
are most useful in the identification of genes that 
make proteins whose functions are essential to 
the life of all eukaryotic cells because such pro- 
teins would be least likely to have undergone ma- 
jor changes in the course of their evolution 
[Mount, see app. Al. Table 3-5 shows how human 
proteins can be classified by their period of in- 
vention: from ancient, to middle-age, to modern 
(14). 
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ASIATIC AMERICAN SPECTACLED GIANT 

BROWN BEAR POLAR BEAR BLACK BEAR BLACK BEAR SUN BEAR SLOTH BEAR BEAR PANDA RACCOON RED PANDA 




40»- OTHER CARNIVORE FAMILIES- 

Photo credit' Stephen O'Bnen, The National Cancer Institute, Frederick. MD Reprinted with permission from Scientific American, . jvember 1967, pp 102-107. 

A phylogenic tree based on dataobtained from modern molecolar genetic methods places the giant panda In the Ursidae, 
or bear family. The red panda Is left in the Procyonidae, or raccoon family. Molecular analysis of the chromosomes of 
these pandas suggests that the raccoon and bear families diverged from a common 
carnivorous ancestor about 35 to 40 million years ago. 



Table 3-5.— Classification of Human Proteins 
by Invention Period 

I. Ancient proteins 

A. First editions. Direct -line descendancy to human 
and contemporary prol<aryotes. Mostly enzymes in* 
volved in metabolism. 

B. Second editions. Homologous sequences in human 
and prol<aryotlc proteins, but apparently different 
functions. 

M. Middle-age proteins 
Proteins found in most eul<aryotes but prolcaryotic 
counterparts are as yet unl<nown. 
III. Modem proteins 

C. Recent vintage. Proteins found in animals or plants 
but not both. Not found In prolcaryotes. 

D. Very recent inventions. Proteins fo jnd in vertebrate 
animals but not elsewhere. 

E. Recent mosiacs Modem proteins clearly the result 
of shuffling exons. 

SOURCE Adapted from Ooollttle, R F , Feng, D h , Johnson. M S . and McCture. 
M A , "Relationships of Human Protein Sequences to Those of Other 
Organiamt," Cold Spring Harbor Symposia on Quantitative Biology 
51 447-456. 1966 



Physical map and sequence data accumulated 
from many species over the past 10 years have 
led scientists to recognize patterns of genome 
change quite different from those proposed 
earlier. Now, molecular evolutionists are begin- 
ning to understand such patterns as the duplica- 
tion and acquisition of new genes and their cor- 
responding functions, differences in the use of 
the genetic code among different organisms (48), 
and differences in the occurrence of gene fam- 
ilies in different species (45). 



Important questions in molecular evolution arise 
from the fact that the genes of prokaryotes (organ- 
isms without nuclei, e.g., bacteria), as well as many 
genes in yeast and some multicellular organisms, 
are not interrupted by introns: 
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• Are the introns found in genes today descend- 
ants of extra or unused DNA from bacteria 
and eukaryotes such as yeast? 

• Did prokaryotes rid themselves of intron se- 
quences; or did they never have them? 

• How did intron sequences get into the genes 
that code for modern proteins (14)? 

By sequencing similar gtnes from many species ; 
scientists have found that some introns have been 
in place for very long evolutionary periods and 
that the positions of mtrons within genes divide 
the genes in ways that correspond to the distinct 
functional domains of the proteins' structures 
(5;22;49). These observations have led to new 
models of molecular evolution (8;13;19-21;47). The 
availability of more gene sequence data should 
facilitate the assessment of theories about the evo- 
lution of genes and gene structures. 



Sequences of more human genes; high-level 
understanding of variations in genomic orga- 
nization among individuals^ and analyses of 
differences between human beings and other 
organisms should aid in the evaluation of 
molecular evolufionary theories on how spe* 
cies originate (see boxes 3-D; 3-E; and 3-F). Rec- 
ognition of differences in rates of nucleotide sub- 
stitution; recombination; and other mechanisms 
responsible for variation in the human genome 
will lead to a better understanding of the molecu- 
lar basis of these processes and of the constraints 
on each. Evaluation of proposed models for the 
propagation and evolution of multigene families; 
such as certain classes of cell surface receptorS; 
requires a detailed knowledge not only of the relat- 
edness of the DNA sequences in these g0nes; but 
also of their locations in the genome and the DNA 
^ jquences of the regions surrounding them (45). 



Box 3-D.— Constructing the Evolutionary Tree- Morphoingy v. Molecular Genetics 

in the Search for Human Origins 

Ever since Linnaeus ; biol(^sts have classified animals according to similarities and differences in form 
and structure. When the concept of evolution took root, these morphological features were used to estab- 
lish phylogenies—trees or lineages that indicate the evolutionary relationships among species. New and 
sophisticated methods of genetic analysis have challenged morphology as the prime determinant of family 
trees. Recent debates about human origins have revealed tne potential power of genetic techniques for 
evolutionary studies. 

For the past two decades or sO; anthropologists and biologists studying the problem of primate evolu- 
tion have agreed that chimpanzees and gorillas are closely related enough to be classified in the same fam- 
ily, vhile humans stand alone in a separate; more distant family. Morphological evidence favors this view. 
Both chimps and gorillas, for example, walk on their knuckles; humans do not, and the fossils of their 
most direct ancestors show no features associated with knuckle walking. Chimps and gorillas also share 
similarities in the thickness and structure of their tooth enamel which suggest a common ancestry separate 
from humans. 

Analyses of „ie DNA of chimps, apes, and human beings contradict this view. Scientists recently exam- 
ined comparable segments of DNA in the region of the beta-globin gene from human beings, chimpanzees, 
gorillas, and orangutan?. They sequenced 4,900 base pairs of DNA from this region in each organism, then 
appended data for nearby regions for which sequences had already been published. In all, they compared 
a 7400-base-pair region and concluded that chimpanzee and human gene sequences were the least diver- 
gent. The most parsimonious explanation of the data was that human and chimpanzee are more closely 
related to each other than either is to the gorilla. 

The beta-globin study, while it strongly suggests that chimpanzees are the closest cousins of human 
beingS; does not conclusively end the search for human origins. Contradictions remain in the evidence 
gathered from comparative anatomy and from genetic analysis; studies of other gene loci will be necessary 
to settle the matter. 

SOURCES 

R L Cann, "In Search of Eve, " The Sciences (September/October) 30-37, 1987 
R. Lewin, "My Close Couiir' the Chimpanzee/' Science 239 273-275< 1987 

M M Miyamoto et a) . "Phylogenetic Relations of Humans and African Apes From DNA Sequences in the Globin Region/' Science 239 369 373. 1987 
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Box The Origin of Human Beings: Clues From the Mitochondrial Genome 

For more than a century, archaeologists, anthropologists, and biologists have been digging through 
layers of dirt and rock, sieving fossils and artifacts, in an attempt to figure out when, Vi^here, and how 
human beings differentiated from other primates to become a unique species. These scientists have relied 
on a variety of tools, everything from the picks and axes used to dig up fossils to sophisticated techniques 
for determining the age of the bones they have unearthed. Unfortunately, archaeological digs do not always 
yield perfect clues: Even well-preserved fossil remains are generally incomplete, and there are still missing 
links, cases in which fossils that could hint at the genealogy of several precursor species have not been 
found. Thus, it has been difficult to determine exactly when human beings diverged from prehistoric ances- 
tors to become the species now known as Homo sapiens 

The development of molecular genetic techniques for analyzing DNA offers a new source of evidence 
in the ongoing debate about human origins. Techniques for mapping and sequencing DNA allow research- 
ers to compare different species and different individuals from the same species at the most basic level. 
These comparisons can aid evolutionary studies. 

One promising approach is the study of the DNA sequences of mitochondria, smal! structures that are 
found in the cells of all multicellular organisms. Mitochondria are the power plants of eukaryotic cells. 
They produce energy for life processes by providing a site for the combination of oxygen and food molecules. 
Without them, cells would depend on less efficient processes of energy production and could not survive 
in an environment containing oxygen. Mitochondria have much in common with bacteria: They are similar 
in size and shape, they both contain DNA, and the;^ each reproduce by dividing in two. 

The DNA in mitochondria can be more useful for some evolutionary studies than the DNA in cell nuclei, 
for several reasons. First, since it lies outside the cell nucleus and sexual recombination occurs only wiihin 
the nuclei of sperm and egg cells, mitochondrial DNA is not recombiiied during sexual reproduction. It 
is inherited only from the mother. Consequently, changes in the nucleotide sequences are due only to muta- 
tion and not to the natural shuffling of DNA that occurs during reproduction. Second, DNA in the mitociion- 
dria is not protected as well as DNA in the nucleus, nor does it have the same kinds of mechanisms for 
repair. Thus, mitochondrial DNA mutates about 10 times as fast as the chromosomal DNA in the cell's nu- 
cleus, which means that the mitochondrial genome has evolved more rapidly than the chrorrjosomal ge- 
nome. Finally, mitochondria are relatively small: They contain approximately 16,000 base pairs, considera- 
bly fewer than the 3 billion base pairs in the entire set of human chromosomes, making them easier to analyze. 

These three charac teristics of mitochondrial DNA-absence of sexual recombination, a high natural 
mutation rate, and small size-~have helped scientists construct a "molecular clock " that can be used to 
help establish the approximate time and place of human origins. By calculating the rate at wh^ch mitochon- 
drial DNA changes and then comparing the DNA sequences of mitochondria from many individuals, re- 
searchers have begun to formulate genealogical trees. For example, scientists sequeiiced samples of mitochon- 
drial DNA from 140 people around thfi world and used the information to propose that the first Homo 
sapiens lived 200,000 years ago on the African continent. Prior to these findings, anthropologists speculated 
that human beings originated nearly 1 million years ago. Debate continues among scientists about the valid- 
ity and proper application of mitochondrial DNA sequences in evolutionary studies, but it is clear that molecu- 
lar genetics will play a growing role in this area. 

SOURCES 

B Alberts, D Bray, J Lewis, et a) , "The Evolution of the Cell, ' in Molecular Biology of the Cell (Xew York, NY Garland Publishing, 1983) 
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R Lewin, "Molecular Clocks Turn a Quarter Century. " Science 239 561 .563. 1988 
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Box 3-F.— Molecular Anthropology 

Anthropologists working in a central Florida bog recently discovered 8,000-year-old human skeletons 
with well-preserved brains, some of which have provided the oldest available samples of human DNA. Be- 
fore this discovery, samples of DNA had been available only from the dried tissue remains of archaeological 
specimens from more arid regions. The fact that DNA can be preserved in other than excessively dry condi- 
tions greatly increases the number of archaeological sites at which more ancient DNA samples may be 
discovered. DNA fragments have also been prepared from Egyptian mummies, from an extinct animal called 
a quagga, and from a 35,000-year-old bison from Alaska. Biologists have been trying to clone these DNA 
fragments for use in studies of evolution. The sample from the extinct bison is likely to be old enough 
for comparison with modern buffalo DNA; this comparison may provide clues to the mechanisms of ge- 
nome evolution. The human DNA samples, although important discoveries, are too recent to be particularly 
informative in studies of molecular evolution. As methods for working with the DNA extracted from these 
ancient species are improved, and as more specimens are uncovered, the application of gene mapping and 
sequencing technologies to anthropology and archaeology will be more feasible. 

SOURCES 

B Bower. "Human DNA Intact After 8,000 Years,' Science Sews. \ov 8, 1986, p 293 

G H Doran, D N Dickel. W £ Ballinger, et al , "Anatomical, Cellular and Molecular Analysis of 8.000-Year<Old Brain Tissue From the Windover Archaeo- 
logical Site/' \dture 323 803 806, 198t> 



APPLICATIONS IN POPULATION BIOLOGY 



Population biologists study populations by 
analyzing many individuals. They are interested 
in similarities and differences among individuals, 
among groups, among varieties, and among spe- 
cies. To address such questions as how geog- 
raphy and environment affect inheritance pat 
terns of certain traits^ a physical map and a 
complete sequence of a single reference ge> 
nome are not particularly valuable. It would 
be more useful to have corresponding sequence 
information from widely d verse geographical 
areas, from various religious and ethnic sub- 
groups, and from all races (9). 

Population geneticists studying human beings, 
plants, or animals make great use of molecular 
markers— RFLPs and, increasingly, sequences of 
specific regions— to assess the extent of genetic 
variability (see box 3-G). Information on the same 
small chromosomal region (e.g., a gene or a re- 
gion important for gene expression) from many 
individuals might be more useful than informa- 
tion on larger chromosomal regions from a few 
persons (43). Genes for rare diseases are not all 
found in a single human genome: Sickle cell he- 



moglobin, for instance, might not have been dis- 
covered if only Northern Europeans had been 
studied (4,9). 

Problems in population genetics that bear on 
public health involve finding means for estimat- 
ing human mutation rates,' for studying suscep- 
tibility to pathogens such as the virus responsi- 
ble for acquired immune deficiency syndrome 
(AIDS), and for assessing possible environmental 
influences on these phenomena. The mechanisms 
generating physical variability among human be- 
ings are by no means well understood a.id involve 
not only genetic factors, but, among other things, 
a complex set of environmental factors. DNA se* 
quences from representative portions of many 
human genomes would also be of more imme> 
diate use than whole ;;enome sequences for 
monitoring the effer/^t of specific environ- 
mental factors on the structure of the human 
genome (9). 

'An OTA report assesses these scientific issues U S Congress, Of- 
fice of Technology Assessment, Technologies for Detecting Herit- 
able Mutations in Human Beings, OTA H-298 (Washington, DC U S 
Government Printing Office, September igg6l 
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Box Implications of Genome Mapping fop Agriculture 

Since the dawn of agriculture, people have manipulated plants to enhance desired traits simply by 
observing the results of breeding, with no true understanding of the genetic principles involved. Many 
scientists working in the field of plant molecular biology believe that genome projects will have important 
implications for agriculture, by increasing knowledge about the genes that control or influence yield, time 
to maturation, nutritional content, resistance to disease, insects, and drought, and other factors in the pro- 
duction of crops. 

The first gene maps ever constructed were assembled as a result of a series of painstakingly detailed 
crosses of pea plants and statistical analyses of data carried out by Austrian monk Gregor Mendel. Mendel 
was the first to recognize that some traits could be transmitted according to regular hereditary patterns. 
All modern genetics and much of modern biology build upon the foundation laid by Mendel. 

Construction of RFLP marker maps has begun for corn, tomatoes, cabbage, and other crop plants. Such 
genetic maps give plant breeders the abUity to use gene structure rather than observable characteristics 
to develop new varieties of plants. This ability should facUitate the development of intricate strategies for 
manipulating complex traits controlled by multiple, interacting genes. 

The availability of RFLP maps makes it possible to select for several unrelated traits simultaneously 
or to manipulate traits controlled by clusters of genes that interact in complex ways. Researchers have 
mapped thiae genes that control efficiency of water use (drought tolerance) and five genes that have a 
major impact on flavor and soluble solids in tomatoes. Three genes that make a major contribution to insect 
resistance in tomatoes have also been mapped. A group of genes that influences yield has been found in 
com. RFLP maps of genes influencing equally important traits are being developed for alfalfa, azaleas, 
cucumbers, onions, roses, sugar beets, and grasses. 

Recently, there has been renewed interest in a small flowering plant called Arahidopsis thaliana, a duck- 
weed in the mustard family. Although tiiis plant has no obvious economic or nutritional value, it is a valu- 
able re--arch tool fci plant molecular biologists. The Arabidopsis genome, at about 70 million base pairs, 
IS aboui 10 percent or less the size of some of the major crop plant genomes, such as cotton, tobacco, 
or wheat. The small size of this plant makes it an important model system for studying general mechanisms 
of gene regulation that may be directly applicable to economically important but genetically less tractable 
plants. For these reasor.s, work has already begun on making complete genetic linkage and contig maps 
of the Arabidopsis genome. 
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Chapter 4 

Social and Ethical Considerations 



'Science is a match that man has just gotta light. He Inought he was in a room-in 
moments of devotion, a temple-and that his light would be reflected from and display 
walls inscribed with wonderful secrets and piUars carved with philosophical systems 
wrought mto harmony. It is a curious sensation, now that the preliminary sputter is 
over and the flame burns up clear, to see his hands lit and just a glimpse of himself 
and the patch he stands on visible, and around him, in place of all that human comfort 
and beauty he anticipated-darkness still." 

H.G, WeHs, 1891 

"The moral significance of humankind is no more threatened by peeking at the un- 
derlying musical notation, the base sequences, than is reading the score of Beethoven's 
last symphony diminishing to that pisce of work." 

Thomas H. Murray, 
Case Western Reserve University, 1987 



INTRODUCTION 



As projects to map and sequence the human ge- 
nome are undertaken, their long-range social and 
ethical implications need to be considered as part 
of policy analysis, yet further knowledge is needed 
before many of these implications emerge. Some 
will arise in the course of deciding what priority 
to give genome projects and what level of resolu- 
tion (coarse genetic linkage map, complete DNA 
sequence) is most appropriate. More profound 
ethical questions are posed by possible applica- 
tions of genetic /.ata for altering the basis of hu- 
man disease, human tfilents, and social behavior. 
Questions about nersonal freedom, privacy, and 
societal versus individual rights of access to genetic 
information are among the most important. A full 
picture of the human genome will of necessity 
raise questions about the desirability of using 
genetic information to control and shape the fu- 
ture of human society. The complexity and ur- 
gency of these issues will increase in proportion 
to advances in mapping and sequencing. 

Part of the reason for studying genomes is to 
see how variations in genes account for differ 
ences among people. Some of the issues raised 
in this chapter relate specifically to these varia- 
tions: What will be the impact of discovering that, 
in their genetic endowment, human beings are 



either more equal or more unequal than n'e now 
suppose? Other problems do not concern genetic 
differences, but rather the impact of discovering 
the extent to which genes do or do not limit the 
options of human beings in gen^^ral. One commen- 
tator has argued that scientists bear a responsi- 
bility for using "moral imagination" to anticipate 
the full range of uses and consequences of their 
work, especially when that work is in the basic 
sciences (2). 

The social considerations raised by genome proj- 
ects include ethical issues. Ethical issues often arise 
in the context of debates about values, principles, 
or human actions that have had particular merit 
in the past. Such debates about what ought to be 
done often cannot be resolved by empirical in- 
quiry. Specific genetic information such as the 
location of a gene along a chromosome or the se- 
quence of nucleotide bases composing a specific 
gene is value-neutral and as such is not ethically 
troublesome. However, questions about private 
investment versus the allocation of Federal re- 
sources or about the proper use and availability 
of genetic information are ethical questions be- 
cause they involve choices among actions based 
upon competing notions about what is good, right, 
or desirable. 
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Competing ideas about the desirable course of 
human r "ion are developed from considerations 
about the greater good, personal freedom, bene- 
fiting others, avoiding harm, and fairness and 
equality. It is important to note that the ethical 
issues surrounding the use of and access to genetic 
information are not unique to the enterprise of 
mapping and sequencing the human genome (10) 
(see box 4-A). The existing uses of genetic screen- 



ing, which in most cases are based on incomplete 
information about the location of a specific gene, 
already raise ethical questions. In addition, some 
general ethical questions are moot because of con- 
temporary realities; for example, the question of 
whether there should be any human genome map- 
ping and sequencing activities at all. This ques- 
tion is moot because mapping and sequencing proj- 
ects have been underway for over a decade and 



Box 4-A.— DWA Fingerprints 

DNA fingerprints are derived from traces of human biological material such as blood, semen, hair, 
or other tissue. Recombinant DNA technology is applied to these samples to identify patterns of genetic 
sequence that are unique to each human being. Matched DNA fingerprints can establish the identity of 
a given individual with near certainty. DNA fingerprints, therefore, have great practical use in establishing 
the identity of criminals, family members, or bodily remains. 

Genetic fingerprinting raises ethical issues such as the maintenance of personal autonomy when tissue 
samples are requested for identification purposes and the maintenance of confidentiality of individual gv^netic 
profiles. Even after tissue specimens have been discarded, there is considerable fear that genetic records 
will be retained in spite of the wishes of the human sonrce of the tissue. California requires convicted 
sex offenders to give blood and saliva samples before their release from prison. The provision of such 
samples also makes it possible to discover information that may be incidental to pa»t criminal records (e.g., 
XYY chromosome, drug use) but that could be used against the present or former inmates. 

In the United States to date, practical applications of DNA fingerprinting have involved tests of specific 
suspects or known criminals. There are plans in California to store this information in the world's first 
computerized data bank of DNA fingerprints. In Great Britain, however, a DNA analysis of blood samples 
from all men and boys between the ages of 13 and 30 in Leicester County was conducted in an attempt 
to identify the person who raped and murdered two teenage girls. A 17-year-old boy originally charged 
with the crimes was released when his genetic profile did not match that derived from the semen left 
in the victims. More conventional investigative methods were later used to catch a suspect, a local baker 
who had avoided the test. The mass screening effort left investigators with a genetic profile on every young 
man in the county, information they later destroyed. 

DNA fingerprinting has also been used as proof of paternity for immigration purposes. In 1986, Bri- 
tain's Home Office received 12,000 immigration applications from the wives and children of Bangladeshi 
and Pakistani men residing in the United Kingdom. The burden of proof is on the applicant, but eL^tablish- 
ing the family identity can be difficult because of sketchy documentary evidence. Blood tests can also be 
inconclusive, but DNA fingerprinting results are accepted as proof of paternity by the Home Office. 

Testing of extended families has been used in Argentina to identify the children of at least 9,000 Ar- 
gentinians who disappeared between 1975 and 1983, abducted by special units of the ruling military and 
police. Many of the children bom to the disappeilred adults were kidnapped and adopted by military "par- 
ents," who claimed to be their biological parents. Once genetic testing of the extended family revealed 
the true identity of the child in question, the child was placed back in the home of its biological relatives. 
It was initially feared that transferring a child from its military "parents " who were kidnappers but who 
had nevertheless reared the child for years would be agonizing. In practice, the transferred children be- 
came integrated into their biological families with minimal trauma. 
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there has been no concerted effort to prohibit 
them. The more immediate questions, therefore, 
are how these projects should best proceed from 
now on and what use should be made of new 
genetic information. 

Each of the following sections begins with a list 
of important social and ethical questions, followed 
by a short general discussion establishing the con- 
text of these issues and, in some cases, outlining 
opposing arguments. Decisions about mapping 
and sequencing rest in part on arguments about 
appropriate allocation of resorrces. Arguments 
about access to versus control of knowledge turn 
on debates about the relative importance of ethi- 
cal principles such as autonomy (that is, self- 
determination or personal freedom of action) and 



beneficence (the duty to act in ways that benefit 
and do not inflict harm on others). There is gen- 
eral concern about the ways in which personal 
freedom of action might be either enhanced or 
diminished by increased knowledge about human 
genetics. Finally, there is significant concern about 
the possibility of eugenics, that is, that new and 
existing information will be used in attempts to 
improve hereditary qualities. The social and ethi- 
cal arguments relevant to mapping and sequenc- 
ing the human genome reveal the tension between 
an attempt to urrivp : t some clear insight about 
duties and obligations and an attempt to weigh 
benefits versus harms. The purpose of this chap- 
ter is to describe and clarify important points of 
social and ethical controversy, not to resolve them. 



BASIC RESEARCH 



• How should the conduct of research in the 
basic sciences, such as genome mapping and 
sequencing, be influenced by a concern for 
the social good? 

• What are the considerations when basic re- 
search in the biological sciences seems to take 
resources away from areas of research that 
might have more immediate social benefit? 

A genetic linkage map of the human genome 
already exists and progress has been made in the 
development of a physical map. Practical debate, 
therefore, centers on questions about the most 
efficient and effective woy to develop the com- 
plete physical map, that is, whether the whole hu- 
man genome should be sequenced in a system- 



atic way and how new genetic infonnation should 
be applied. 

How these questions are answered depends 
upon the values attached to scientific progress and 
the relationship between scientific progress and 
human good. There is a strong argument that basic 
scientific research is valuable in and of itself and 
should be pursued for its own sake. Coordinated, 
systematic mapping of the human genome is con- 
sistent with this view, and proponents argue for 
resources and against constraints in the name of 
conducting good sci nee. Others argue that sci- 
entists need to be responsive to and sometimes 
even constrained by the public interest (7). 



LEVELS OF RESOLUTION 



• what level of resolution of the physical map 
is really needed, and for what purposes? 

While even a rough genetic map, permitting the 
identification of markers linked with major dis- 
eases, might prove useful to insurers or others 
bent on identifying high-risk individuals, it would 
have less value for basic researchers than a more 
precise map. From an ethical standpoint, the key 
arguments about levels of resolution, or molecu- 
lar detaU, are based on the distribution of costs 
and benefits involved. If the public is asked to pay 



an appreciable portion of the cost, then it deserves 
to participate in the political debate about embark- 
ing on an expensive, full-scale project. Scientific 
and technical factors being equal, chromosomal 
regions in which greater clarity would benefit 
many people (e.g., those associated with preva- 
lent genetic diseases) might be addressed first. If 
the largest share of the costs is borne by the pri- 
vate sector, then few, if any, questions of priority 
will be posed, other than those chosen by the per- 
sons in' nesting in the projects. 



ERLC 



$2 



ACCESS AND OWNERSHIP 



• what are the ethical considerations pertain- 
ing to control of knowledge and access to in- 
formation generated by mapping and se- 
quencing efforts? 

• Who should have access to map and sequence 
information in data banks? 

• Do scientists have a duty to share informa- 
tion; what are the practical extent and limits 
of such an obligation? 

• Who owns genetic information? 

• Eto property rights to individuals' genetic iden- 
tities adhere to them or to the human spe- 
cies (14)? 

• Is genetic information merely a more detailed 
account of an individual's vital statistics, or 
should this information be treated as intrin- 
sically private, not to be sought or disclosed 
without the individual's express consent (10)? 

There is a method in scientific research that al- 
lows investigators to pursue their hunches, test 
their hypotheses, replicate their results, and pub- 
lish their dings in roughly that order. Careful 
adherence to this process ensures accurac / and 
the orderly development of knowledge. The time 
lag between discovery of new information and 
communication of it, however, has caused some 
^ ommentators to question whether scientists have 
the right to withhold information about genetic 
markers that might be of great interest to the pub- 
lic at large. 



From an ethical perspective, it may be argued 
that genetic information is by definition in the pub- 
lic domain: The human genome is a collective prop- 
erty that should be held in common among all 
persons of human heritage «8). An opposing argu- 
ment is that, since gene sequences are not com- 
monly knowable and understanding them re- 
quires the use of expensive and often patentable 
machinery, discovey of sequences and the fruits 
that derive from tbetn belong to the person who 
uncovered them. By this reasoning, it does not 
matter whether the sequences are unique or how 
they might be used, it is the labor and inventive- 
ness associated with the discovery of them that 
makes them valid intellectual property. Current 
patent lav/ takes the latter tack but limits patent- 
ability by preventing the patenting of a person 
or an idea. 

Opo prominent scientist has acknowledged the 
public's special claim to the genome but argues 
that a public enterprise may not be the best way 
to satisfy this claim and that delay on so urgent 
a project serves no one (5). A significant portion 
uf the value of the genetic information gathered 
through human genome projects will not be ^''Uy 
realized until some decades after the pre jecti, are 
completed, but there is little doubt that it will help 
elucidate the function and physical location of 
genes that cause or predispose to illness and dis- 
ease. For this reason alone, the sequences uill have 
substantial commercial value. 



COMMERCIALIZATION 



• What facets, if any, of human genome map- 
ping and sequencing activities should be com- 
mercialized? 

The commercial value of genome sequences has 
already been recognized by companies that have 
applied for patents on a number of specific mate- 
rials and techniques. At least one company has 
argued that it has tiie right to copyright and con- 
trol the materials and maps that it develops (5). 

The selective forces of the marketplace have 
generated a database network, some portions of 
which are in the public domain and others of 



which are held by individual companies. The ethi- 
cal issues of privatization of this knowledge turn 
on the importance of sequences lost to others by 
academic communities or corporations which 
have restricted the use of them. On one level, the 
problem is largely academic, since the data needed 
for a complete map and sequence could be assem- 
bled by the public sector, with duplication or pur- 
chase of the data held by private parties. On 
another level, however, the potential loss of criti- 
cal data, the duplication of effort, and the control 
of knowledge raise serious questions about a com- 
bined scheme of public versus proprietary hold- 



83 



ing of fundamental knowledge. There is a strong lowing scientists and others to retain the bene- 
argument that parts of research that are funded fits of commercial exploitation of inventions, 
publicly should yield public information, while al- 



DIAGNOSTIC/THERAPEUTIC GAP 



• What are the ethical implications of the grow- 
ing gap between diagnostic and therapeutic 
capabilities? 

• Should diagnostic information about genetic 
disorders for which there is no therapeutic 
remedy be handled differently from that 
about disorders for which there are therapeu- 
tic interventions? 

There is no doubt that continuing scientific ad- 
vances in mapping and sequencing the human 
genome accelerate diagnostic applications. One 
philosopher has noted that the abUity to map the 
human genome yields information about suscep- 
tibility that is more precise, more certain, and 



potentially more threatening to individual free- 
dom and privacy than earlier methods of presymp- 
tomatic diagnosis and va^.ue hypotheses about 
famUial traits (10). A related issue is the need to 
protect information that may be available to or 
sought by third parties such as insurance compa- 
nies or employers. Progress to date indicates that 
the ability to diagnose a genetic abnormality pre- 
cedes the development of therapeutic interven- 
tions and that this gap may be growing. This is 
true for many genetic diseases, an important ex- 
ample being Huntington's disease (see box 7-A in 
ch. 7). 



PHYSICIAN PRACTICE 



• Do physicians and other health care providers 
face a conflict between an increasingly reduc- 
tive approach to medical science and a focus 
on holistic patient care (17)? 

Increased information about human genetics 
changes attitudes and alters the knowledge that 
serves as a basis for i-^ealth care interventions. Phy- 
sicians and other hea'th care providers must con- 
stantly alter their views and understanding of hu- 
man behavior, health, and disease . There are many 
examples of diseases that were once thought to 
be amenable to preventive health care that are 



now known to have a genetic component or cause. 
On a practical level this presents obvious difficul- 
ties, as health care providers are increasingly un- 
certain whether they are dealing with patterns 
of health and illness in individuals that can be 
ameliorated by changes in life style and medical 
treatment or if such patterns are in large part a 
matter of genetic destiny. In addition, the ethical 
principle of respect for persons indicates that in- 
dividuals must be treated with care, compassion, 
and hope because they are persons and not merely 
the embodiments of a genetic formula or code. 



REPRODUCTIVE CHOICES 



what ethical considerations arise from the 
increased ability of parents to determine the 
genetic endowment of their children (through 
such practices as selective termination of 
pregnancy, selective discarding of human em- 
bryos created in vitro, or selection of X- or 
Y-bearing sperm to determine the sex of the 
child)? 



The ethical question of one generation's duties 
and obligations to another becomes more evident 
as genome mapping generates data pointing to the 
serious consequences of certain cultural practices 
or mating patterns. For example, it has been 
demonstrated that, if it were possible to choose 
the sex of their children, many individuals and 
couples would prefer that their firstborn be male 
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(18). It has also been demonstrated that firstborn 
children benefit from their early period of exclu- 
sive parental attention. If firstborn boys became 
the norni; it might further compromise equality 
of opportunity between men and women (16). In 
such circumstances I the conflicts among values 
and ethical principles such as autonomy, justice, 
and beneficence will be strong. Hunan mating 
that proceeds without the use of genetic data 
about the risks o\ transmitting diseases will 
produce grep/ier mortality and medical costs than 
if carrierf of potentially deleterious genes are 



alerted to their status and encouraged to mate 
with noncarriers or to use artificial insemination 
or other reproductive strategies (3). 

On a practical level, the availability of informa- 
tion that couples might use to select embryos cre- 
ated in vitro has been hampered by an absence 
of federally funded research concerning many as- 
pects of human fertilization. There has been a de 
facto moratorium on such research since 1980 
(13). 



EUGENIC IMPLICATIONS 



• what ethical concerns arise from possible eu- 
genic applications of mapping and sequenc- 
ing d 'ta? 

The possibility of mastery and control over iiu- 
man DNA once again raises the highly charged 
issue of genetic selection. One major difference 
between current and previous attempts at eugenic 
manipulation is that any potential eugenicist will 
have substantially more powerful techniques to 
effect desired ends and more data with which to 
muster support. With even the modest knowledge 
achieved in their first century, genetic techniques 
have become sophisticated enough to permit the 
use of selective breeding to produce animals with 
desired qualities. 

When Francis Galton defined eugenics in 1883 
as the science of improving the "stock," he in- 
tended the concept to extend to any techniques 
that might serve to increase the representation 
of those with ''good genes.'' Thus, he indicated 
that eugenics was ''by no means confined to ques- 
tions of judicious mating, but takes cognisance of 
all the influences that tend, in however remote 
a degree, to give the more suitable races or strains 
of blood a better chance of prevailing speedily over 
the less suitable than they otherwise would have 
bad" (4). Prior to tlie development of recombinant 
DNA techAulogy, eugenic aims were pnmarily 
achieved by attempting to control social practices 
such as marriage. New technologies for identify- 
ing traits and altering genes make it possible for 
eugenic goals to be achieved through technologi- 
^ cat as opposed to social control. 



Knowledge of human genetics will amplify the 
power to intervene in the diagnosis and treatment 
of disease. Each time a person who would other- 
wise have died of a disease caused or influenced 
by a gene is treated successfully by genetic or non- 
genetic means, the frequency of that gene in the 
population increases [Lappe, see app. A]. Human 
genome projects will intensify -"id accelerate the 
already difficult debates about who should have 
access to one's genetic information by providing 
faster and cheaper methods of testing fcr genetic 
variations, by making much more information 
available, and by increasing the specificit}' of 
genetic information (15). The ethical debate about 
eugenic applications more properly focuses on 
how to use new information rather than on 
whether to discover it. Eugenic programs are 
offensive because they single out particular peo- 
ple and therefore can be socially coercive and 
threatening to the ideas that human beings^have 
dignity and are free agents. 



Positive Eug jnics 

Beginning with Plato, philosophers have recog- 
nized that eugenic ends could be achieved through 
subtle or direct incentives to bring together 
presumptively Ht human beings. Positive eugenics 
is defined here as the achievement of systematic 
or planned genetic changes in individuals or their 
offspring that improve overall human life and 
health and that can be achieved by programs that 
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do not require direct manipulation of genetic ma- 
terial. 

Most commentators have rejected or cast doubt 
on any uses of genetic engineering to enhance or 
direcdy improve the human condition. The Presi- 
dent's Commission for the Study of Ethica' Prob- 
lems in Medicine and Biomedical and Behavioral 
Research declared that efforts to improve or en- 
hance normal people, as opposed to ameliorating 
the deleterious effects of genes, is at best prob- 
lematic (11). 

It may well be that the problem with positive 
eugenics has more to do v^th the means than v^th 
the ends. The basic objective of improving the hu- 
man condition is generally supported, although 
debates about just what constitutes such improve- 
ment continue. Many concerns about eugenic pol- 
icies in the past focused on the methods used to 
attain them, such as sterUization, rather than on 
the ends themselves. 

Negative Eugenics 

Negative eugenics refers to policies and pro- 
grams that are intended to reduce the occurrence 
of genetically determined disease. It implies the 
selective elimination of gametes (ova or sperm) 
and fetuses that carry deleterious genes, as well 
as the discouraging of carriers of markers for 
genetic disease from procreation. There are few 



technical obstacles to karyotyping human beings 
for eugenic reasons. Verbal genetic histories of 
sperm donors, for example, are designed to ex- 
clude donors carrying some genetic diseases. Such 
a screening process, accompanied by a physical 
examination and kiboratory tests, has already been 
recommended by the Ethics Committee of the 
American Fertility Society (1). The development 
of specific genetic tests could make gamete screen- 
ing easier and more specific and wrill also expand 
existing capabilities to conduct prenatal tests. 



Eugenics of Normalcy 

The third eugenic use of genetic information 
would be to ensure not merely that a person lacks 
severe incapacitating genetic conditions, but that 
each individuai has at least a modicum of normal 
genes. One commentator has argued that individ- 
uals have a paramount right to be bom with a 
normal, adequate hereditary endowment (6). This 
argument is based on the idea that there can be 
some consensus about the nature of a normal 
genetic endowment for different groups of the 
human species. The idea of genetic normalcy, once 
far-fetched, is drawing closer with the develop- 
ment of a full genetic map and sequence; how- 
ever, concepts of what is normal will always be 
influenced by cultural variations and subject to 
considerable debate. 



ATTITUDES 



• How will a complete map and sequence of 
the human genome transform attitudes and 
perceptions of ourselves and others? 

One of the strongest arguments for supporting 
human genome projects is that they will provide 
knowledge about the determinants of the human 
condition. One group of scientists has urged sup- 
port of human genome projects because sequenc- 
ing the human genome will provide one of the 
most powerful tools humankind has ever had for 
deciphering the mysteries of its ovm existence (12). 

The relevance of this proposition will depend 
on the degree to which complex human behaviors 
are determined by understandable genetk factors. 



It will also depend on how important human ge- 
nome projects are to understanding genetic fac- 
tors for complex traits. Whether higher human 
attributes are reducible to molecular constructs 
is a topic of considerable debate in the philoso- 
phy of biology, and human genome projects would 
doubdess enlarge and intensify this debate. A rea- 
sonable hypotheses is that, while litde informa- 
tion of direct or immediate value regarding com- 
plex behaviors is likely to resuh from human 
genome projects, insights into the possible con- 
struction of control regions for the development 
of the human embryo, the genetic basis for orga- 
nizing neuronal pathways, and the genetic con- 
trol of sexual d'^ferentiation will all be significantly 
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enhanced. In the long run, knowledge of human 
genetics will make scientific understanding of hu- 
man life more sophisticated. 

A greatly increased understanding of how genes 
shape characteristics could influence human be- 
ings* attitudes toward themselves and others 
(Glover; see app. A). Such increased understand- 
ing might highlight the degree to which genetic 
factors are equal or unequal for traits that con- 
fer social advantage. This information might re- 
veal that human beings have fewer options than 
they suppose and could thereby encourage a de- 
terminist view of human choices (see box 4-B); or 
it could reveal just the opposite. A general increase 



in genetic information might also alter social cus- 
toms based on erroneous scientific assumptions. 

Many individuals have general beliefs about 
their genetic potential for achievement in certain 
spheres of activity, about the limits of possible 
improvement through effort or environmental 
change. These intuitive beliefs are often vague and 
inaccurate. Often; it is only in regard to a few skills 
or characteristics that individuals have pushed 
against the limits of their potential. When science 
makes it possible to trace the actual limits of indi- 
viduals ; intuitive perceptions may turn out to be 
wrong. This has the potential of both enhancing 
and limiting personal liberty. 



Boic 4-B.— Determinism and the Human Genome 

Determinism in biology is the general thesis that, for every action taken, there are causal mechanisms 
that preclude any other action. Mapping and sequencing the human genome wiH not alone impose a deter- 
minist view of human nature. Seeing where genes are located, or knowing the order of bases in the DNA, 
will not alone make behavior predictable. 

But mapping and sequencing together with tracing the pathways between genes and behavior will start 
to paint a determinist picture. Scientists are now starting to work out these pathways. Take, for example, 
the pattern of behavior classified by psychiatrists as sensation seeking, which involves a disposition toward 
gambling and alcoholism. This behavior is correlated with low levels of activity of the platelet monoamine 
oxidase. These levels of activity have been shown by studies of twins to be largely under genetic control. 

In a determinist model, human actions can be explained in terms of causal mechanisms, even though 
those mechanisms may be very complex. If this model is right, it seems that what human beings do, just 
as much as what billiard balls do, is the product of a set of laws operating in particular circumstances. 

This view of human nature is disturbing. It suggests that a Godlike scientist, with complete knowledge 
of all the relevant causal laws and of the circumstances in which they operate, could successfully predict 
human action. In two different ways, determinism is at least an apparent threat to our attitudes. First, 
the elimination of genuine choice would leave no room for the belief that we can partly create, actualize, 
or modify ourselves. Second, undermining choice may also undermine many emotional reactions to others. 
The determinist picture may not leave room for justifiable resentment of what people do or for justifiable 
feelings of blame or guilt. 

There are alternative views within determinism. Hard determinism is the view that individual choice 
is entirely ruled out, along with the emotional responses linked to holding people responsible for what 
they do. Soft determinism asserts that free choice and responsibility are compatible with determinism. 

The issue is whether the soft determinist can resist the hard determinisms argument against freedom 
and the reactive attitudes. There are two strategies for resisting: 1) to point out that determinism is not 
the same as fatalism, that even in a deterministic *vorld what human beings do influences the future; and 
2) to disagree that determinism eliminates genuine choice, attempting to work out a model of free action 
that is compatible with determinism. 

SOURCES 

Office of Technology AswMtnent. 19M 
Glover, tee app A 
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ROLE OF GOVERNMENT 



• what is the proper role of government in 
mapping and sequencing^the human genome? 

• Specifically; does the government have a role 
in deciding what data should be collected in 
gene mapping and sequencing? How should 
this information be disseminated and guarded 
from abuse? 

The lines of power; coercion, and authority in 
the public and private scientific sectors are blurred 
because the first genetic maps are being made in 
corporations (e.g.; Collaborative Research; Inc.) 
and in private philanthropies based in universi- 
ties (e.g.; the Howard Hughes Medical Institute 
at the University of Utah). 

The ethical arguments for involving the Federal 
Government in the process of genome mapping; 
whether by shaping; constraining, blocking; or 
doing nothing; center on the public interest in 
making resources available in ways that are con- 
sistent with the considerations of beneficence, jus- 
tice, and autonomy. These issues encompass aca- 
demic freedom or freedom of scientific inquiry 
l/ecause the projects have universal and lasting 
implications . Once the human genome is mapped 
and sequenced; the resulting data will have wide- 
spread implications for generations to come 
(Lappe; see app. A]. 

The precise boundary between basic and ap- 
plied science is hard to draW; but there is enough 
understanding of where it lies to be able to use 
it as a basis for policy. A case might very well be 
made for a government policy that would leave 



basic research unrestricted but that would place 
some stringent controls on applied research and 
technological applic-^tions, for example; by ensur- 
ing that genetic testing is voluntary and access 
to data is controlled. 

All research carries with it the likelihood of 
changing one's conception of the world and so 
of changing one's attitudes. Fo. these reasons, 
there is a strong case against government inter- 
vention to stop research. There are four main 
arguments: 

1. Stopping research might be opting for com- 
fortable ignorance or illusion rather than un- 
comfortable truth. The growth of science has^ 
rested on the preference for uncomfortable 
truth. Those who view science as one of man- 
kind's finest creations will be dismayed at any 
wholesale repudiation of this preference. 

2. It is unlikely that existing world views ; be- 
liefs; and attitudes can be protected by shut- 
ting down basic research. The knowledge that 
such protection was needed might itself start 
to undermine existing views. 

3. As a practical matter, it may be that govern- 
ment cannot stop basic research. It is not easy 
to monitor what goes on in laboratories; and 
what is stopped in one country may take place 
in another (Glover, see app. A]. 

4. Stopping research blocks both possible ben- 
efits and risks. The belief that research can 
be performed to permit benefits while cop- 
ing with and occasionally avoiding risks is a 
matter of historical precedent. 



DUTIES BEY 

• what, if any, ethical issues are raised when 
considerations of international competitive- 
ness influence basic scientific research? 

• What; if any; are the duties and obligations 
of the United States to disseminate mapping 
and sequencing information abroad? 

• What are the implications of shared informa- 
tion for international competitiveness? 

• What are the international implications of 
shaving technological applications of mapping 



D BORDERS 

and sequencing information? 
• What issues are involved when applications 
of genetic information or biotechnology that 
are of great use to Third World countries are 
not developed or fully exploited because they 
are less profitable for industrialized coun- 
tries? 

The United States has recently proposed an in- 
ternational framework of rules for science. The 
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purpose of this framework is to see that all na- 
tions do their fair share of basic research and that 
all the results of such research be made public, 
except for those with strategic implications (9). 
The increased protection of intellectual property 
and patent rights for technological innovations 
formed the basis of this proposal; these rights were 
also central to recent international trade talks. 
There is some sentiment that barriers to the trans- 
fer of technology would continue even if there 
were no reward for intellectual property. One 
commentator has noted that; unless products are 
protected by a set of principles now, basic scien- 
tific results could become increasingly restricted; 
some nations might do less basic research and in- 
stead emphasize applying other nations' results (9). 



The most common single-gene defects, disorders 
of the hemoglobin molecules that i:arry oxygen 
in red blood cells, are highly prevalent in many 
nations in Southern Europe, Africa, the Middle 
East, and Asia. Such nations wou'.d benefit most 
if research tools became widely available as they 
were developed and if priorities for which chro- 
mosomal regions are mapped first took world 
prevalence of disorders into account. Use of map 
and sequence information by developing nations 
may also require special attention to devising 
screening tests that are cheap and simple, and 
might entail access to services (e.g., sequencing 
or mapping) located in developed nations 
(Weatherall, see app. A]. 



CONCLUSION 



All human beings have a vital interest in the so- 
cial and ethical implications of mapping and se- 
quencing the human genome. It is not surprising, 
therefore, that t^t'^^e ''''bates about how ge- 
nome projects shouiv ^ ed. These extend be- 
yond considerations of scientific efficacy and in- 
volve the interests of patients, research subjects, 
physicians, academicians, lawyers, entrepreneurs, 
and politicians. Mapping the human genome ac- 
celerates our rate of understanding— and the dis- 
tance between increased understanding and di- 
rect intervention to alter the human genome is 
shrinking. Add to this the development of scien- 
tific tools such as gene probes, and immediate 



practical questions are posed: How should basic 
research be conducted? What level of resolution 
in mapping is necessary? Who should have access 
to and ownership of data banks and clone reposi- 
tories? How should thorny questions surround- 
ing commercialization be handled? Long-range 
questions about eugenics, reproductive choices, 
the role of government, and possible duties and 
obligations beyond national borders also arise. 
These questions are complex and are not likely 
to be resolved in the near future. It will therefore 
be necessary to ensure that some means for ex- 
plicitly addressing ethical issues attends scientific 
work. 
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Chapter 5 

Agencies and Organizations 
in the United States 



"Science has been a formative factor in making both the Federal Government and 
the American mind what they are today. The relation of the government to science 
has been a meeting point of American political practice and the nation's intellectual life." 

A, Hunter Dupree, 
Science and the Federal Government 
(Baltimore: Johns Hopkins University Press, 1986), p, 2. 



Projects involving or related to mapping or se- 
quencing the human genome can be found in sev- 
eral Federal agencies and nongovernment orga- 
nizations in the United States. Activities at the 
principal agencies and organizations wall be brief- 
ly reviewed in this chapter. They include: 

• Federal agencies: 
—the National Institutes of Health, 
—Department of Energy, 
—the National Science Foundation, 
--the National Bureau of Standards, 



—the Centers for Disease Control, 
—the Department of Defense, 
—the Office of Science and Technology Policy, 
—the Office of Management and Budget, 
—the Domestic Policy Council; 
• Nongovernment organizations: 
—the Howard Hughes Medical Institute, 
—the National Research Council, 
—private corporations, 
—private biomedical research foundations and 
other philanthropies. 



NATIONAL INSTITUTES OF HEALTH 



The National Institutes of Health (NIH) form one 
branch of the Public Health Service of the U.S. 
Department of Health and Human Services. They 
are administered by a director, currently James 
Wy ngaarden. NIH is a highly decentralized con- 
federation of institutes, divisions, bureaus, and 
the National Library of Medicine (see figure 5-1). 
The principal mission of NIH is to conduct and 
support biomedical research to improve human 
health. 

NIH was established in 1887. Since World War II, 
it has become "the foremost biomedical research 
facility not only in the United States but in the 
world" and "a brilliant jewel in the crown" of the 
Federal Government, according to Wilbur Cohen, 
former Secretary of the Department of Health, 
Education, and Welfare (10). 

The institutes with the largest budgets for 
genetic mapping and DNA sequencing are the Na- 
tional Institute of General Medical Sciences, the 



National Cancer Institute, the National Institute 
of Allergy and Infectious Diseases, the National 
Institute of Child Health and Human Development, 
and the National Institute of Neurological and Com- 
municative Disorders and Stroke (see table 5-1). 
In fiscal year 1986, NIH supported approximately 
3,000 projects that involved mapping or sequenc- 
ing, with a combined budget of $294 million (out 
of a total budget of $5.26 billion, of which all but 
5 percent went for research activities) (18). NIH 
estimated it spent $313 million for such projects 
in fiscal year 1987 (18). 

Planning at NIH is decentralized. The Office of 
the Director has responsibility for overall direc- 
tion, but most programmatic decisions are made 
in the institutes, which are autonomous and 
largely control their own budgets. A 1984 report 
by the Institute of Medicine remarked on the "ab- 
sence of the trappings of bureaucratic authority; 
hence the Director manages largely on the basis 
of persuasion, consensus, and knowledge" (11). 
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Figure 5-1.— Organization of NIH 
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Table 5*1.— NIH Support for Mapping and Sequencing, 
Fiscal Year 1986 (millions of dollars) 





Human 


Nonhuman 




Institute 


research 


research 


Total 


NIQMS 


12.4 


99.6 


112.0 


NCI 


. 18.3 


24.2 


42.5 


NiAID 


60 


28.0 


34.0 


NICHD 


. 11.8 


18.2 


30.0 


NINCDS 


10.7 


10.6 


21.3 


Other Institutes and 








divisions and the NLM . 


. 31.9 


22.1 


54.0 


Total 


91.1 


203.0 


294.0 



Abbravlctlona: NIQMS - Inatltuta of General Madtcai Sciences. NCI - National 
Cancer Inatltuta, NIAID - National Institute of Allergy and Infectious Diseases, 
NtCHD - National Inatltuta of Child Health and Human Development, NINCDS 
• National Institute of Neuroiogtcal and Communicative Disorders and Stroke, 
NLM - Library of Medicine 

SOURCE: Off Ice of the Director. National Institutes of Health, May 1987, as mod- 
ified by the Office of Technology Assessment 



policy were presented by many experts. State- 
ments both in favor of and against special initia- 
tives were aired (22). A working group of NIH ad- 
ministrators was formed subsequent to that 
meeting , The working group is chaired by the di- 
rector; other members represent several of the 
institutes and divisions most directly involved.^ 
This working group is responsible for setting over- 
all policies for NIH in connection with human ge- 
nome projects; and it initiated two program an- 
nouncements in 1987 (17). Included in NIH's 
related research figures are several grants to pro- 
duce physical maps of other organisms or parts 
of human chromosomes^ to dsvelop cloning or 
DNA detection techniques^ and to develop other 



Coordination of the various institutes is accom- 
plished largely by the Office of the Director. In 
October 1986; the Advisory Committee to the Di- 
rector held a meeting at NIH entitled "The Hu- 
man Genome/' at which views about setting NIH 



K^er members of the NIH working group on the human genome 
are Ruth Kirschstein (Director of the National Institute of General 
Medical Sciences), Betty Pickett (Director of the Division of Research 
Resources). Duane Alexander (Director of the National Institute of 
Child Health and Human Development), Donald A,B, Lindberg (Di- 
rector of the National Library of Medicine), Jay Moskowitz (Associ- 
ate E)irector for Program Planning and Evaluation), and George Palade 
(Yale University) Rachel Levinson (Office of the Director) is execu- 
tive secretary. 
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relevant technologies. T le new genome programs 
are to develop new methods for analysis of com- 
plex genomes and to improve computer repre- 
sentation and anatysis of information derived from 
molecular biology (12). These solicitations for 
proposals were not associated with any new or 
additional funding in 1987; but Congress has set 
aside $17.2 million for them in 1988. The budget 
request for fiscal year 1989 is $28 million. To re- 
view complex genome and informatics proposals, 
NIH will convene new peer review committees. 

The NIH also plans to seek advice from outside 
scientists and to keep congressional staff abreast 
of gencxne projects through a series of workshops, 
the first of which was held February 29 and March 
1; 1988. 

The Institutes 

The National Institute of General Medical Sci- 
ences (NIGMS) supports research and training in 
the basic biomedical sciences fundamental to un- 
derstanding health and disease. Its primary func- 
tion is to support research projects conducted by 
scientists throughout the nation and the world 
that can serve as the bases for the more disease- 
oriented research undertaken by the other NIH 
institutes. NIGMS will administer the funds set 
aside for characterization of complex genomes. 
Unlike most of the other NIH institutes, NIGMS 
has no intramural research program— its fund- 
ing is for work done by non-NIH scientists. 

NIGMS supports a m ijor share of basic research 
in genetics, including research on nonhuman spe- 
cies. Such work is concentrated at NIGMS because 
the institute is responsible for research related 
to fundamental biology or a broad array of dis- 
orders rather than to a disease group, develop- 
mental stage, or organ system. Genetics under- 
lies many physiological processes and can explain 
many disease states, but most fundamental 
genetics research is not designed to elucidate a 
single disease; rather, it elucidates general mech- 
anisms or illuminates how human diseases might 
occur by showing how other organisms function. 
Understanding other organisms is often the first 
and most important step in understanding human 
health and disease, but the details of how knowl- 
edge about bacteria, yeast, or animals wUl relate 



to human biology is rarcly known in advance. 
These are some of the reasons that NIGMS sup- 
ports such a large share of the work on genetics 
of nonhuman organisms. 

Each NIH institute other than NIGMS has as its 
mission the support of research on a range of dis- 
eases. The range of diseases may be defined by 
organ system, developmental stage, explicitly 
named disease group, or other criteria (see fig- 
ure 5-1). The distinction between the kinds of work 
supported by NIGMS and by the other institutes 
is not hard and fast; in fact, supfx)rt extends over 
a broad range of scientific projects that could come 
under the aegis of NIGMS or one of the other in- 
stitutes. The National Institute of Child Health and 
Human Development (NICHD), for example, has 
a program that investigates the basic molecular 
biology of development. In connection with this, 
NICHD convened in May 1987 a meeting of scien- 
tists working on human chromosome 21. Chro- 
mosome 21 is of special interest to persons doing 
research on Down's syndrome, Alzheimer's dis- 
ease, and several other diseases; it is also of inter- 
est because it contains the genes underlying sev- 
eral important and well-characterized biochemical 
processes. 

All of the institutes support genetic research (in 
fact, other institutes support more of it in the ag- 
gregate than NIGMS), but this research is often 
directed at finding the location of a particular 
disease-associated gene. (For example, study of the 
famUial form of Alzheimer's disease is supported 
by the National Institute of Neurological and Com- 
municative Disorders and Stroke, the National In- 
stitute on Aging, and the National Institute on Men- 
tal Health .) Institutes develop preventive diagnostic 
tests and therapies for genetic diseases. 

Funding Meciianisms 

Spending at NIH is predominantly for investiga- 
tor-initiated, basic, undirected research. Most 
projects are related to human diseases, animal 
models of human disease, or fundamental re- 
search on biological questions that might il- 
luminate human biology in health and disease. 
NIH's primary funding mechanism is the investiga- 
tor-initiated scientific grant (classified as ROl by 
the NIH bureaucracy and widely known by that 
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term) awarded to a single investigator or small 
group. The typical ROl grant (and there are now 
more than 6400 of them) is given to a research 
scientist at a university or other research center 
in response to a proposal submitted by that sci- 
entist. The proposal outlines the research ques- 
tion addressed; the approach to the question ; the 
people who would work on the project, and the 
budget for the project. The average grant amount 
for projects that involved mapping or sequenc- 
ing was $130;000 in 198G (5). 

Some efforts— those with a specific purpose— 
are more amenable to funding by contract. The 
GenBank^ database of nucleic acid sequences, for 
example; is supported by this mechanism under 
a $17.2 million 5-year contract with Intelligenetics 
Corp. of Mountain View, California (with a sub- 
contract to the Los Alamos National Laboratory, 
where GenBank® is housed) . NIGMS administers 
theGenBank^ contract and is the principal fund- 
ing unit; with contributions from other NIH insti- 
tutes and divisions^ the Department of Enei^, and 
the Natbnal Science Foundatk)n. NIGMS also main- 
tains the Human Mutant Cell Repository under 
contract. This is a resource for persons attempt- 
ing to use genetic techniques to understand dis- 
eases or physiological processes. In these instances 
and otherS; NIH can contract with a provider to 
deliver a service. 

Each NIH institute other than NIGMS adminis- 
ters a program of intramural research; in most 
cases located on the NIH campus. Investigators 
are employed directly by NIH. The intramural re- 
search programs of NIH collectively constitute the 
largest biomedical research facility in the world. 
NIH's intramural research complements its ex- 
tramural support of university and research cen- 
ter scientists. Components of human genome 
projects that require direct management by NIH 
or that would be best integrated into existing pro- 
grams could be added to the intramural research 
programs. 

NIH is not often associated with large, centrally 
administered programs, but it does support many. 
The National Cancer Institute, for example, sup- 
ports a number of centers that bring research, 
training, information dissemination, and clinical 
application under one roof or administrative ar- 
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rangement in order to accelerate the communi- 
cation of ideas among normally disparate groups. 
Program and center grants are typically larger 
than investigator-initiated grants and can include 
funds for training as well as for equipment and 
research materials. A concerted research program 
has recently begun to combat AIDS (acquired im- 
munodeficiency syndrome). NIH has the capac- 
ity to direct a research program that requires co- 
ordination and some central planning. 

Research Infraatructure 

A small but important fraction of NIH funding 
goes to support a research infrastructure— 
resources used by a wide array of scientists and 
clinical investigators to facUitate their research. 
Much of the support for a research infrastruc- 
ture comes from the Division of Research Re- 
sources (DRR) at NIH. Databases for genetic in- 
formation, funding for repositories (e.g., for 
human cell lines, DNA clones, and probes), and 
support of the National Library of Medicine are 
also important components of the research infra- 
structure for mapping and sequencing. 

The DRR supports regional and national centers 
with various purposes. It is divided into five pro- 
grams, several of which support projects relevant 
to mapping and sequencing. One of the purposes 
of DRJt-supported resources is to provide scien- 
tists and clinicians with access to advanced re- 
search technologies . This involves support of sev- 
eral databases, materials repositories, computer 
resource centers, and grants to generate and ana- 
lyze biomedical research data (see table 5-2). DRR 
cofunds with NICHD the Repository of Human 
DNA Probes and Libraries. This repository facUi- 
tates exchange of research materials crucial to 
genome projects. DRR also supports a grants pro- 
gram to apply artificial intelligence and other so- 
phisticated approaches ot information science to 
understanding sequence data and managing large 
masses of biological information. Several data- 
bases, repositories, and activities supported by 
DRR are cofunded by other NIH institutes or agen- 
cies of the Federal Government. DRR funds its re- 
sources through grants and contracts, primarily 
to nongovernment scientists. It has helped fund 
two workshops directly related to human genome 
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Tabto 5-2.— Division of Rssssrch Rssourcos Actlvltiss Rslatsd to IMolscular Qanstlcs 



Resource 



Function 



Protein Identification Resource 

Sequences of Proteins of 
ImnDunologlcal Interest 
BIONET 



Dana Farber Cancer Institute and 
Baylor College of Medicine (with 
other NIH Institutes) 

National Flow Cytometry Resource 
(with DOE. other NIH Institutes) 

DNA Segment Library 
(with NtCHD, DOE) 



Cell Line Two-Dlmensional Gel 
Electrophoresis Database 



Database for protein sequences and 
software 

Annotated protein sequence file 

Network, database linkage, and 
software for use In molecular 
biology 

DNA sequence analysis software 
and other computer resources 

Chromosome and cell sorting 

Distribution center for cloned 
human DNA made by Los Alanfx>s 
and Lawrence LIvermore national 
laboratories 

Cell line analysis by protein 
electrophoresis in two dimensions 



Location 



SOURCE: Offict of T«chnok>gy AM«Mm«nt, 10 



Georgetown University 
Washington, DC 

Bolt, Beranek, & Newman, Inc. 
Boston, MA 

Intelllgenetics 
Mountain View, CA 

Boston, MA and Houston, TX, 
respectively 

Los Alamos National Laboratory Los 
Alamos, NM 

American Type Culture Collection 
Rockvlile, MD 



Cold Spring Harbor Laboratory Cold 
Spring, NY 



projects— one with the Department of Energy 
(DOE) on materials repositories and databases, the 
other with DOE, the National Library of Medicine; 
and the Sloan Foundation on applying informa- 
tion management systems to analysis of complex 
biological problems. 

The National Library of Medicine (NLM) is the 
largest and most comprehensive collection of med- 
ical information in the world . The Bbrary also sup- 
ports an exten^Ve medical bibliographic resource 
—the publish d Index Medicus and MEDLARS/ 
MEDLINE; the most widely used on-line computer 
reference service for medicine and biomedical re- 
search. The NLM has been caUed "the foremost 
biomedical communications center in the world" 
(10) and the "central nervous system of American 
medical thought and research" (16). 

The library was started in 1818 as a few books 
in the office of the Surgeon General of the Army; 
Joseph LoveU. Its great flowering occurred un- 
der John Shaw Billings in the period after the Civil 
War; when it became an internationally recog- 
nized medical library. The library v as transferred 
from the military to the civilian sector in 1956; 
and its name was changed to the National Library 
of Medicine through legislation sponsored by Sen- 
ators Lister Hill and John Kennedy. A new build- 
ing for the collection was constructed on the NIH 



campus in 1962; and in 1980 the lO-story Lister 
Hill Ctenter was dedicated. The Bbrary became part 
of NIH in 1968 (16). 

The NLM's expertise lies in managing clinical 
and biomedical research information. This in- 
cludes not only storage of books and journals; but 
the publication of reference works that list the 
extensive international biomedical literature and 
the maintenance of computer databases that make 
access to the medical information more efficient. 
In recent yearS; the Board of Regents of the NLM 
has pointed to bioiechnology databases as an area 
of expected future growth and has encouraged 
library staff to provide improved access to data- 
bases relevant to geneticS; molecular biology; and 
other aspects of the "new biology." 

Late in the 99th Congress; Senator Claude Pep- 
per introduced a bill; the National Center for Bio- 
technology Information Act of 1986; that would 
give NLM responsibility to ''develop new commu- 
nications tools and serve as a repository and as 
a cer.^ter for the distribution of molecular biology 
information'' (H.R. 99-5271). The bUl was rein- 
troduced early in the lOOth Congress with minor 
modifications (H.R. 100-393); and a companion 
measure with very similar provisions (S. 100-1354) 
was introduced in the Senate by Lawton Chiles. 
The bill was further amended and introduced as 
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S. 1004966 jointly by Senators ChileS; Kennedy, 
Domenici; Leahy, Graham; and Wilson in Decem- 
ber 1987. These bills would make the NLM re- 
sponsible for improving access to the numerous 
databases used in molecular biology and clinical 
genetics, with funding authorized at $10 million 
per year for fiscal years 1988 through 1992. Ap- 
propriations for fiscal year 1988 included $3.83 
million for these purposes (13). 

The NLM has been conducting research on how 
to make human genetic information available to 
the medical community for several years. It has 
made Victor McKusick's Mendelian Inheritance 
in Man (15); the pivotal catalog of human genetic 
loci identified by analysis of pedigreeS; available 
on-line through its Information Retrieval Experi- 
ment program, and it has linked the data in this 
volume to information available in GenBank® and 
the Protein Identification Resource databank . The 
library has also begun an experimental program 
to link molecular biology databases; ubing re- 
searchers on the NIH campus in Bethesda to test 
the system. It plans to make DNA sequence and 
protein database analysis possible through a com- 
puter link to the National Cancer Institute's su- 
percomputer center in Frederick, Maryland. The 
Howard Hughes Medical Institute and the NLM 
have been discussing ways to link access to the 
various databases supported by NIH and the in- 
stitute. 



Peer Review 

The National Cancer Institute became the first 
American institution to routinely employ peer re- 
view when it established the National Cancer Advi- 
sory CouncU in 1937 (26). Since then; peer review 
has become an essential element in allocating 
funds for research grants at NIH . The review sys- 
tem is two-tiered: The initial review is done by 
study sectk)nf of scientific experts; the second tier 
involves recommendations for funding made by 
an institute advisory councU. 

Review of the typical grant involves several 
steps. A grant application is received by the Divi- 
sion of Research Grants at NIH from an investiga- 
tor (or from a program or center) under sponsor- 
ship of an institution. The application is then 
assigned to a group of scientists from a particu- 
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lar discipline appointed by the Director of NIH. 
These groups meet three times a year to review 
grant applications. They assess applications for 
their scientific merit (including originality; feasi- 
bility; and importance); the competence of the in- 
vestigators to do the work; and the appropriate- 
ness of the proposed budget (4). The study section 
votes to approve; disapprove; or defer considera- 
tion of an application. For grant applications that 
are not defended; a priority score ranging from 
100 (best) to 500 is assigned; based on the rank- 
ings of the individual members of the study sec- 
tion. This priority score is then included in a simi- 
mary statement for each application that briefly 
states reviewers' opinions. The summary (and 
where necessary the full documentation) is then 
passed on to the appropriate advisory council. 

Each institute at NIH has an advisory council; 
composed of eminent scientists and informed lay 
members; that recommends applications for fund- 
ing. Advisory councils monitor the quality and fair- 
ness of review by the study sections and assess 
special relevance to important national health 
needs and the mission of the institute. Members 
of advisory councils are appointed by the Secre* 
tary of Health and Human Services; except those 
on the National Cancer Advisory Board; who are 
appointed by the President. In most caseS; the advi- 
sory council approves the actions of the study sec- 
tions. Fewer than 10 percent of grant applications 
art singled out for special discussion or action by 
the advisory councUs (26). 

Staff of the NIH institutes then ^ank the ap- 
proved proposals. Priority scores are the main; 
but not the sole; determinants ot funding: An esti- 
mated 1 to 2 percent of proposals are funded be- 
cause of their particular relevance to a pressing 
health need; the need to start research in areas 
of future importance; a desire for balance in the 
portfolio of grants supported by an institute; ethi- 
cal considerations; or importance to NIH pr€)gram 
needs (26). Roughly one in five grant applications 
is referred to more th«m one institute by the study 
section (11). A small proportion of applications is 
funded by more than one institute; typically, how- 
even they are funded by one institute or are not 
awarded. 

Contracts and special programs also receive peer 
revieW; usually through program review commit- 
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tees organized by the institutes or divisions. The 
intramural research programs are reviewed by 
non-NIH scientists who serve on boards adminis- 
tered by the institutes. Special review committees 
are also constituted by the institutes to review 
center or program grants. 

In its 1984 report on the organization of NIH, 
the Institute of Medicine noted that "the genius 
of the institution in shaping scientific excellence 
to health needs is found in the interplay between 
the categorical research institutes and the discipli- 



nary study sections" (11). This statement refers 
to the fact that except for NIGMS, the NIH insti- 
tutes focus on a category of diseases or organ sys- 
tems. Study sections, in contrast, are composed 
of scientists from a particular discipline or area 
of expertise (e.g., genetics, pharmacology, pathol- 
ogy). These may overlap, but they often do not. 
Institutes consider applications from different 
study sections, sometimes from as many as 18 (11). 
In 1986, there were 2,700 scientists and lay rep- 
resentatives serving on 155 review committees 
at NIH (4,26). 



DEPARTMENT OF ENERGY 



Much of the attention devoted to mapping and 
sequencing the human genome can be traced to 
activities in the Department of Energy. DOE has 
already begun a program of targeted research on 
the human genome— the Human Genome 
Initiative— to construct physical maps of several 
human chromosomes and to develop relevant 
technologies . The part of DOE responsible for the 
Human Genome Initiative is the Offlce of Health 
and Environmental Research in the Office of 
Energy Research (see figure 5-2). 

The Office of Health and 
Environmental Research 

The history of the Office of Health and Envi- 
ronmental Research ^HER) goes back to the Man- 
hattan Project of World War H, which was orga- 
nized to develop fission bombs. OHER began as 
the Health Division, started in 1942 by Nobel laure- 
ate Arthur Holly Compton, a physicist at the 
University of Chicago . The division focused on pro- 
tecting people from the effects of radiation and 
on the use of radioactive chemicals in medicine 
and biomedical research. The research base was 
broadened to include fossU fuels and renewable 
energy sources by the Znergy Reorganization Act 
of 1974. These functions were retained when the 
Energy Research and Development Administra- 
tion became the Department of Energy in 1977 
and OHER was established (21). The primary mis- 
sion of OHER has been to study sources of radia- 
tion, pollution, and other environmental toxins 
^)articularly those related to the generation of 
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Figure 5-2.— Organization of DOE 
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energy); to trace them through the environment; 
and to determine their effects. Another mission 
is to exploit the resources of DOE-administered 
national laboratories to the maximum benefit of 
the nation. 

Research at OHER is conr^ucted largely through 
the system of national laboratories. There are eight 
general-purpose laboratories that conduct OHER 
research as well as research in physical sciences 
and mathematics, and there are nine dedicated 
OHER laboratories located near national labora- 
tories or universities. In addition; OHER supports 
research at 100 universities and research centers. 

OHER's involvement in the human genome de- 
bate is traced by its former director, Charles 
DeLisi; to an idea that occurred to him late in 1985 
when he was reading a draft of the OTA report 
Technologies for Detecting Heritable Mutations 
in Human Beings (23;27). He realized the impor- 
tance of having a reference human sequence for 
OHER's work. Subsequent discussions disclosed 
that researchers at the Lawrence Livermore and 
Los Alamos National Laboratories were thinking 
about ordering DNA clones to make a physical 
map as an extension of ongoing work. Robert Sin- 
sheimer, chancellor of the University of Califor- 
nia at Santa Cruz, had hosted a workshop on the 
feasibility of sequencing the human genome the 
previous year and was very interested. During 
this period; Nobel laureate Renatto Dulbecco pub- 
lished a brief article in Science urging that the 
human genome be sequenced (8). 

DOE sponsored the Human Sequencing Work- 
shop in Santa Fe, New MexicO; in March 1986; 
and DeLisi outlined a three^int strategy in a May 
1986 memo: 1) to produce a set of overlapping 
DNA clones related to a physical map of human 
chromosomes (see ch. 2), 2) to develop high-speed 
automated sequencing methods; and 3) to improve 
methods for computer analysis of map and se- 
quence information. DOE al^ funded a second 
workshop. Exploring the Role of Robotics and 
Automation in the Decoding of the Human Ge- 
nome, in January 1987. Funding for the DOE ini- 
tiative, based on the three-pronged attack, began 
in fiscal year 1987, with $4.2 million going to 10 
projects at three national laboratories and at Har- 
vard and Columbia Universities (6). DOE plans to 
spend $12 million on human genome projects in 
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fiscal year 1988 and has requested $18.5 million 
for 1989. A special appropriation of $12.7 million 
was added to the Office of Energy Research bud- 
get to construct a buUding for an Institute of Hu- 
man Genomic Studies at Mount Sinai Medical Cen- 
ter in New York. Tliis resulted from a congres- 
sional initiative; and operations of that institute 
are not part of human genome projects sponsored 
by DOE. The reason for the name of the institute 
is unclear; but the institute will apparently house 
clinical genetic services for its region (24). 

OHER projects include assembling an ordered 
set of overlapping DNA clones spanning human 
chromosome 19 at the Lawrence Livermore Na- 
tional Laboratory; making a similar clone set for 
chromosome 16 at the Los Alamos National Lab- 
oratory (using somewhat dirferent techniques); 
and constructing a physical map of chromosome 
21 and chromosome X at Columbia University. Ef- 
forts in 1988 will expand to include more univer- 
sity groups and the Lawrence Berkeley National 
Laboratory and other national laboratories. An 
effort to sequence the genome of the bacterium 
Escherichia coli by a new method is being sup- 
ported at Harvard University. Other projects in- 
clude construction of DNA clones covering the 
full set of human chromosomes (not cataloged in 
order) and development of new technologies for 
sequencing; detecting; and analyzing DNA. 

Early enthusiasm for mapping and sequencing 
at the national laboratories stemmed largely from 
existing OHER projects. In one set of projects; laser- 
activated cell sorting was used to separate indi- 
vidual chromosomes. Cell sorting began naturally 
in the national laboratories because of easy ac- 
cess to high-technology instrumentation and a 
multidisciplinary blend of scientists; including bi- 
ologists; chemistS; physicists; computer scientists, 
engineers; and mathematicians. The first fluores- 
cence-activated cell sorter was developed at the 
Los Alamos National Laboratory. This instrument 
was used at the Lawrence Livermore and Los 
Alamos National Laboratories to sort human chro- 
mosomes; and these chromosomes were used to 
produce sets of DNA clones. The effort was divided 
into two phases. 

The first phase was to make sets of small frag- 
ments of cloned DNA (up to several thousand base 
pairs) in lambda phage. This phase has been com- 
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pleted; and the clone sets have been turned over 
to the American Type Culture Collection in Rock- 
viUe, Mar>'land. (Preparation of the clones is 
funded by DOE; storage and distribution of tlie 
clone sets are funded jointly by NICHD and DRR. 
DRR also supports the cell*sorter facility at the 
Los Alamos National Laboratory.) The second 
phase is to develop clone sets of up to 45,000 base 
pairs using cosmids and other vectors (see ch. 2 
for details). The next logical step is to order the 
clone sets. 

In future years, DOE plans to expand its efforts 
substantially. A report recently written by a sub- 
committee of the Health and Environmental Re- 
search Advisory Committee is the main public 
planning document for DOE work on human ge- 
nome projects. 

Health and Environmental Research 
Advisory Committee Report 

The Health and Environmental Research Advi- 
sory Committee (HERAC) is a group of scientists 
from universities, national laboratories, and pri- 
vate corporations which reports to the Director 
of the Office of Energy Research. Its main func- 
tion is to advise the Director of OHER on the sci- 
entific program supported by OHER. In late 1986, 
HERAC formed a subcommittee on the human ge- 
nome to make recommendations about DOE's Hu- 
man Genome Initiative. The subcommittee was 
chaired by Ignacio Tinoco and included members 
from the Howard Hughes Medical Institute, uni- 
versities, biotechnology companies, and one sci- 
entist from a national laboratory. The subcom- 
mittee's document was approved and was sub- 
mitted by HERAC to Alvin Trivelpiece, then Di- 
rector of the Office of Energy Research, in April 
1987 (25). 

The subcommittee report urges DOE to develop 
two important tools for research in molecular bi- 
ology: a reference human DNA sequence and the 
means to interpret and use it. These would be 
created by a new research program divided into 
two stages. The first phase (5 to 7 years) would 
focus on: 

• assembling ordered DNA clone sets of the hu- 
man chromosomes; 

• locating genes and other markers on a physi- 
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cal map based on these sets; 

• producing sequences of selected clones and 
distributing that information; 

• developing new techniques for mapping and 
sequencing; 

• applying automation and robotics to mapping 
and sequencing; 

• creating computational and other methods 
for identifying genes; 

• finding new algorithms for analyzing DNA 
sequences; and 

• establishing computer facilities, databases, 
materials repositories, networks, and other 
resources to promote use of the methods and 
resources produced by the projects. 

Budget recommendations for the first phase are 
noted in table 5-3. The second phase woiiild pro- 
vide a complete sequence for each human chro- 
mosome and would make new technologies avaO- 
able for use in addressing the central questions 
of medicine and biology. 

The subcommittee recommends that the work 
be widely distributed among national laboratories^ 
universities, and companies because of the "highly 
creative nature" of the science needed to meet 
the objectives. The research program would in- 
clude work by many small groups funded through 
investigator-initiated grants, as well as larger mul- 
tidiscipUnary centers or consortia. The report also 
recommends that D|OE establish a two-tiered sys- 
tem of peer review: one or two initial review com- 
mittees to assess technical merit and feasibility^ 
and a policy committee to determine overall strat- 
egy, develop policy, and oversee scientific review. 



Tabl« 5-3.*- Budget Proposed for DOE Human 
Genome Initiative (millions o1 dollars) 



Fiscal year Amount that year Cumulative amount 



1986 20 20 

1989 40 40 

1990 80 140 

1991 120 260 

1992 160 420 

1993 200 620 

1994 200 820 

1995 200 1,020 



SOURCE' Subcommittee on the Human Qenome, Health and Environmental 
Research Advisory Committee, /7eporf on the HumMn Genome /n/f/c> 
tlv§, prepared for the Office of Health and Environmental Research. 
Office of Energy Research (Qermantown, MO* Departrnent of Energy, 
April 1987) 
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The subcommittee urges DOE to ensure that 
the results of the projects be in the public domain 
and that the efforts be made in cooperation with 
those of other agencies in the United States and 
abroad; within the constraints of Federal law 
governing technology transfer and concern for 
national competitiveness in biotechnology. 

A broad*based research prograr* to foster de- 
velopment of technology is outlined; followed by 
the rationale for DOE involvement; namely: 1) the 
historical relation to ongoing work at the national 
laboratories; 2) DOE's experience with directed 
research programs (as opposed to the much larger 
and more dive\*8e human and animal research sup- 
ported by NIH); 3) the relation to the mission of 
OHER (in assessing mutational damage from ra- 
diation and environmental exposure or develop- 
ing new energy resources); and 4) access to mul- 
tidisciplinary teams in the national laboratory 
system. The potential utility of DNA sequencing 
for monitoring exposure to radiation and toxic 
chemicals is noted as a principal reason for de- 
veloping sequencing technologies. 



The primary justification for the new initiative 
is its potential utility. The technologies and infor- 
mation deriving from it would make future re- 
search more efficient Qess costly and more power- 
ful), would directly improve human health; and 
would aid economic growth of industries depen- 
dent on biotechnology. A final section of the re- 
port warns that; although the program is of the 
highest priority; it should not be permitted to hin- 
der worthwhile ongoing programs; including re- 
search on nonhuman organisms. Concern that a 
large new program at DOE would impede devel- 
opment in other fields is countered with the ob- 
servation that large new sums of money have al- 
ready been introduced into molecular biology: 
HHMI has increased its arinual spending on bio- 
medical research by over $150 million during die 
last decade; with primarily beneficial results. The 
subcommittee ends by stating its opposition to cre- 
ating any large; inflexible organization to execute 
or supervise the work. 



NATIONAL SCIENCE FOUNDATION 



The blueprint for the National Science Founda- 
tion (NSF) grew from the report Science— The End- 
less Frontier, written by Vannevar Bush in 1945 
(2). The original ideas for NSF; as propounded by 
Bush and Senator Harley Kilgore; were modified 
by postwar events and eventually led to legisla- 
tion creating the foundation in 1950. The prin- 
cipal purpose of the NSF was to continue the Fed- 
eral Government's role in sponsoring basic 
research; a role that developed during World 
War n (9;14). Biology at NSF is supported through 
its Directorate of Biological; Behavioral; and So* 
cial Sciences. In fiscal year 1987; NSF spent an 
estimated $32.7 million on research related to gene 
mapping and sequencing. Of this amount; only 
$2(X),000 went for focused projects on gene map- 
ping and sequencing of nonhuman organisms; the 
bulk was for basic research ($13.7 million) and 
for the research infrastructure; such as develop- 
ment of methods; new scientific instruments; data- 
bases; and repositoies and support of instrumen- 
tation centers ($19 million). Planned spending for 
1988 was $37.9 million. These figures are part of 



the $206 million spent by NSF in support of bio- 
logical science in fiscal year 1987; out of the total 
NSF budget of $1.62 billion (12). 

NSF supports primarily basic research in all sci- 
ences. Fu^jport of basic research grants is the 
largest single component of NSF funding related 
to human genome projects. In recent years ; NSF 
has increased its emphasis on engineering and 
technology development (e.g.; it partially sup- 
ported development of the California Institute of 
Technology's DNA sequenator). In 1987; NSF an- 
nounced a Biological Centers Program intended 
to stimulate the growth of knowledge in biologi- 
cal research areas important to the continued de- 
velopment of biotechnology. Support for these 
centers; estimated at $12 million for fiscal years 
1987 and 1988; constitutes the second largest com- 
ponent of NSF funding of genome-related activi- 
ties. A center for bioprocess engineering has been 
functioning at the Massachusetts Institute of Tech- 
nology for several years. (NIH has also supportf d 
this center; for research training.) Two types of 



er|c 



103 



centers are to be created under the Biological 
Centers Program: One wiU focus upon sharing 
capital-intensive instrumentaiion and developing 
nevv^ instruments; the other will host large-scale 
multidisciplinary research. Either could be used 
by groups mapping and sequencing various organ- 
isms. NSF also sf>onsors a program on biological 



instrumentation and funds individual grants for 
basic biological science Although the NSF bud- 
get for biology is small relative to its support for 
other areas and to DOE and NIH support, it 
nonetheless supports mapping and sequencing 
through bioengineering, basic biology research, 
and the centers programs. 



NATIONAL BUREAU OF STANDARDS 



The National Bureau of Standards (NBS) was cre- 
ated in 1900 as the National Standardizing Bureau. 
It is part of the Department of Commerce^ and 
its primary mission since its inception has been 
to develop standards in scientiflc and technical 
fields in order to facilitate industrial progress and 
to prevent incompatibilities that could hamper re- 
search or technological applications. NBS also has 
a program of research in methods and instrumen- 
tation that has grown naturally out of tracking 
diverse and rapidly advancing technologies. Its 
main technical expertise lies in the physical, chem- 
ical, and information sciences, but it is now de- 
veloping expertise in biotechnology. It has joined 
with the Montgomery County Government and 
die University of Mar>'land, for example, in sup- 
port of the Center for Advanced Research in Bio- 
technology in Gaithersburg, Maryland. 



NBS has been suggested as a candidate agency 
for quality control and research on measurements 
for DNA mapping and sequencing. This would give 
jit the function in biology that it has for physics 
and chemistry but would entail a considerable ex- 
pansion of its expertise and resources devoted to 
molecular biology. Its role could include check- 
ing data for accuracy, assessing the accuracy of 
the machines used in the muhicenter mapping and 
sequencing efforts, and setting standards for the 
reporting of results. NBS might also conceivably 
develop technical standards for automated ma- 
chines and computers used in creating or analyz- 
ing data about DNA. If NBS undertakes a func- 
tion in quality control and standard setting, it will 
need close collaboration with NIH and DOE, where 
the bulk of expertise currently lies. 



CENTERS FOR DISEASE CONTROL 



The Centers for Disease Control (CDC) are situ- 
ated in the Public Health Service of the Depart- 
ment of Health and Human Services. The main 
offices are located in Atlanta, Georgia. CDC is the 
Nation's primary resource for tracking the inci- 
dence and prevalence of diseases and for inter- 
vening to thwart the spread of infectious agents 
and preventable diseases. Related to this mission, 
CDC maintains databases, disseminates infoi^a- 
tion, and provides materials widely used in clini- 
cal research. 

CDC could have a role in quality control and 
in monitoring scientific activities involved in map- 
ping and sequencing the human genome. It has 
performed this function in the past, through its 
Lipid Standardization Program. This program be- 



gan more than 25 years ago to provide quantita- 
tive measurements for laboratories engaged in 
lipid research related to diseases of the heart and 
blood vessels. Since the progi am was initiated, 
over 500 national and international laboratories 
have received and analyzed reference materials 
provided by CDC (3). Quality control and stand- 
ard setting may become important as map and 
sequence data become more plentiful and as more 
laboratories come to rely on a common set of data. 
If such measures prove necessary, CDC is a possi- 
ble agency for determining or confirming the 
chromosomal location or origin of DNA fragments 
or for orienting new DNA fragments on the emerg- 
ing physical maps. If this function were to be un- 
dertaken b y CDC, close communication with NIH 
and DOE would be necessary. 
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DEPARTMENT OF DEFENSE 



While biological research is not the main mis- 
sion of the Department of Defense (DoD), some 
components of mapping and sequencing DNA 
might be shared with or conducted by various 
components of DoD. Each military service (par- 
ticularly the Army and Navy) conducts some re- 
search in biology, primarily that related to ihe 
health needs of military personnel or to defenses 
against chemical and biological warfare. DoD 
reports that all such research is unclassified: Much 
of it is conducted at military facilities or contractor- 
administered laboratories; but some of it is con- 
ducted in universities as well. DoD supports some 
generally useful resources in biomedical research. 

The Armed Forces Institute of Pathology (AFIP) 
is an international treasure house of tissue sam- 
ples and microscopic slides spanning the full range 
of human disease. Its tissue collection is used by 
pathologists and biomedical researcher s through- 
out the world. AFIP began as the Army Medical 
Museum in 1862. It became the AFIP in 1949, 
when the Navy and Air Force joined with the Army 
in support of it, and the role of the institute has 
expanded steadily since then. Today AFIP consti- 
tutes the largest oi^ganization of research and diag- 



nostic pathologists in the world. The institute has 
received more than 2.2 million cases (tissue or 
slides from patients) from over 50,000 patholo- 
gists affiliated with more than 19,000 hospitals 
^nd clinical facilities. AFIP's unique capabilities as 
a tissue repositon* 'e been expanded to include 
modern storage tt .iuques . Through further ex- 
pansion of its capabilities, the staff and facilities 
of AFIP could be used as a national tissue reposi- 
tory and assessment center for the full spectrum 
of human diseases. The institute could play a role 
in linking map and sequence data to human dis- 
eases. The availability of systematically classified 
human tissues could faciUtate development and 
testing of medical products and diagnostic meth- 
ods to probe the molecular basis of various 
diseases. 

The military biomedical research com»nunity 
would have an interest in map and sequence data 
because investigations of the effects of chemical 
and biological weapons would include the study 
of genes that are particularly vulnerable to attack 
and the construction of vaccines or other defen- 
sive measures. 



OFFICE OF SCIENCE AND TECHNOLOGY POLICY 



The Office of Science end Technology Policy 
©STP) is headed by the Presid 2nt*s Science Advi- 
sor. OSTP's primary responsibility is to a Jvise the 
President on science policy, on matter o where sci- 
entific or technical inf^nrrt'on n relevant to Fed- 
eral policy decisions, and on national policies for 
technology development. vOSTP can on occasion 
form coordinating councUs under the Fideral Co- 
ordinating Council for Science, Engineering and 
Technology. An example of OSTP coordination in 
life sciences is the Biotechnology Science Coordi- 
nation Committee, which started as an OSTP ini- 
tiative responsible primarily for devising guide- 
lines for regulation uf biotechnology products. 



Representatives of OSTP followed the human ge- 
nome debate and spoke at several national meet- 
ings in 1986 and 1987. 



OSTP recently announced plans to reorganize 
its oversight of life sciences. It plans to form a 
Committee on Life Sciences for interagency com- 
munication and coordination, and it tentativ^ely 
plans to establish subcommittees on specific topics. 
Genome projects have been noted as likely to ne- 
cessitate such a suDcommittee, although the ex- 
act role and composition of it is not yet determined 
(7). 



DOMESTIC POLICY COUNCIL 



The Domestic Policy Council (DPC) is a cabinet- 
level gr. ap that reviews government activities. 
David Kingsbury of NSF, as acting chairman of 
the Biotechnology Working Group; gave a brief 
presentation on human gene mapping to the DPC 
in February 1987. An interagency subcommittee 
of this working group, chaired by NIH Director 
James Wyngaarden, was formed and met in May 
1987 to exchange information on agency activi- 
ties. NIH; DOE, NSF, the Food and Drug Adminis- 
tration; the U.S. Department of Agriculture, the 



Environmental Protection Agency, and the Office 
of Management and Budget were represented. 
The purpose of the subcommittee was to mini- 
mize duplication of effort among the agencies and 
to promote interagency communication. The sub- 
committee has subsequently been disbanded, to 
be replaced by the OSTP group noted above. The 
DPC will cr tinue to keep abreast of developments 
on genome projects through the President's Sci- 
ence Advisor, who will administer the OSTP grouD 
(1). 



OFFICE OF MANAGEMENT AND BUDGET 



The Office of Management and Budget (0MB) 
monitors and coordinates the annual budget proc- 
ess for executive agencies of the Federal Govern- 
ment and oversees management of the agencies. 
Each year, every Federal agency prepares a bud- 
get request that is reviewed within the agency 
and then submitted to 0MB. 0MB reviews the re- 
quests and develops a budget for the President; 
this budget is submitted to Congress in January 
for the fiscal year beginning that October (al- 
though the process is late for fiscal year 1989 be- 
cause of delay in passing the 1988 budget). OMB's 
budget-coordinating function places it in the po- 
sition of arbiter among dif fei ent agencies if there 



are conflicting priorities or potential duplications. 
By this mechanism, and by monitoring other activ- 
ities in multiple departments, 0MB can encourage 
communication and coordination of activities. 
0MB has one budget officer for NSF, another for 
NIH, and a third for DOE. These officers are re- 
sponsible for other agencies as well, and the activ- 
ities related to human mapping and sequencing 
constitute only a small fraction of their total bud- 
get responsibility. Two officers in the 0MB sci- 
ence office have taken primary responsibility for 
tracking the human genome budget submissions 
of all agencies (20). 



HOWARD HUGHES MEDICAL INSTITUTE 



The Howard Hughes Medical Institute (HHMI) 
was created in 1953 by aviator-industrialist How- 
ard Hughes. It is a medical research organization 
wi* an endowment of approximately $5 billion. 
HHMI has increased its research funcUng dramat- 
ically over the last decade, from roughly $15 mil- 
lion in 1977 to approximately $240 million in 1987. 

HHMI operates three programs (see figure 5-3). 
The first and largest is scientific research in 27 
laboratories located in hospitals, academic medi- 
cal centers, and universities throughout the United 



States. The second program, which supports the 
first research program and is integrated with it, 
includes a genome resources project, a research 
training program for medical students (jointly with 
NIH), and sponsorship of HHMI meetings and re- 
views. A third program will provide $500 million 
over the next decade through grants and special 
programs to support education in the medical and 
biological sciences. 

Under its first program, HHMI conducts re- 
search in five basic scientific areas: genetics, im- 
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munology; neuroscience, cell biology and regu- 
lation; and structural biology. Several HHMI 



investigators are involved in genetic mapping and 
related computational research, physical mapping, 
and medical genetics. The principal HHMI centers 
for genetics are located at the University of Utah 
(Salt Lake City, Utah), the Baylor College of Medi- 
cine (Houston, Texas), the University of Michigan 
(Ann Arbor, Michigan), the University of Califor- 
nia (San Francisco, California), The Johns Hopkins 
University (Baltimore, Maryland), the University 
of Pemisylvania (Philadelphia, Pennsylvania), 
Brigham Hospital (Boston, Massachusetts), and 
Childrens' Hospital (Boston, Massachusetts). HHMI 
estimates that it expended $40 million for genetics 
research in 1987; of which $2 million to $4 mil- 
lion were devoted to fmding and using DNA mar- 
kers and constructing genetic maps. 

The HHMI genome resources project has a cur- 
rent annual budget of over $2 million. Through 
that pixsject, HHMI supports nonsequence data- 
bases relating to human genetics, including the 
Human Gene Mapping Library (New Haven, (Con- 
necticut) and the On-Une Mendelian Inheritance 
in Man (Baltimore, Maryland) (see figure 5-2). 
HHMI also helps maintain a mouse genetics data- 
base at Jackson Laboratory (Bar Harbor, Maine) 
and collaborates with the Center for the Study 
of Human Polymorphism (CEPH) headquartered 
in Paris, France. CEPH is a critical collaborative 
institution that links several large groups work- 
ing on construction of human genetic maps. HHMI 
has participated in a large number of meetings 
on the human genome, including one it sponsored 
directly in July 1986 at NIH; at that workshop, 
strategies and policies for mapping and sequenc- 
ing were discussed. HHMI partially supported the 
ninth Human Gene Mapping Workshop, held in 
Paris in September 1987, which compiled data on 
international gene mapping activities since the pre- 
vious meeting in 1985. 



NATIONAL RESEARCH COUNCIL 



The National Academy of Sciences was estab- 
lished by Congress in 1863. President Lincoln 
signed the law that brought it into existence. The 
National Research Council (NRC) was established 
in 1916 to provide advice to the Federal Govem- 
n^t about issues invohdng science. The principal 
impetus was the increasing relevance of science 
to preparations for World War I. The National 
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Research Council now conducts many studies on 
issues relating to science and technology. It is oi^a- 
nized into several disciplinary groups. 

In August 1986, a group of scientists interested 
in issues surrounding genome projects met in 
Woods Hole, Massachusetts. This group agreed 
to formulate a proposal for a study by the NliC, 
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which was presented to and approved by the Basic 
Biology Board of the Commission on Life Sciences 
in September 1986. A committee of distinguished 
scientists; chaired by Bruce Alberts, was appointed 
to consider the scientific issues connected with 
genome projects . The committee on mapping and 
sequencing the human genome held several pub- 
lic meetings in 1987 and released its report in Feb- 
ruary 1988 (19). 

The Alberts committee was composed of scien- 
tists of different backgrounds, with varying de- 
grees of direct involvement in mapping and se- 
quencing projects and with initially divergent 
views on genome prqects. The committee reached 
consensus on several points during the course of 
its deliberations. The committee concluded that 
mapping; sequencing; and understanding the hu- 
man genome merited a special effort funded ?nd 
organized specifically for this purpose. 

The committee's report recommends that the 
prqects should begin with "a diversified; sustained 
effort to improve our ability to analyze complex 
DNA molecules;'' with a "focused effort that em- 
phasizes pUot projects and technological develop- 
ment It lists the specific types of maps that would 
be useful as early genome projects; notes the im- 
portance of mapping and sequencing genomes of 
nonhuman organisms; and stresses the need for 
thorough peer review . The proposed projects dif- 
fer from ongoing research by focusing on meth- 
ods that would improve mapping; sequencing, 
analyzing; or interpreting the biologicid signifi- 
cance of information in the human genome by 
five- to ten-fold. The committee also notes the need 
for central databases, repositories, and quality con- 
trol facilities. 

Research prefects that merit special support are 
explained in some detail. The committee favors 
development and refinement of techniques in the 
early years, with most support going to work on 
mapping large genomes. One specific goal in early 
years, for example, would be to enable sequenc- 
ing of 1 miUion continuous base pairs. The need 



for technological progress is noted for several ad- 
ditional areas: to isolate chromosomes, to create 
ceU lines, to clone substantial portions of DNA 
from genomes of whole organisms, to clone DNA 
in large fragments, to isolate laiige DNA fragments; 
to order DNA clones derived from genomeS; to 
automate many steps involved in mapping and se- 
quencing DNA, and to improve the collection, stor- 
age, dissemination, and analysis of information 
and materials. Administratk)n of these centralized 
functions would be conducted by a scientific advi- 
sory board, including at least one fuU-time scien- 
tist appointed as chairman. This scientific advi- 
sory committee would also serve to advise the 
agencies and to act as the focal point for interna- 
tional cooperation. 

The committee recommended that $200 million 
per year be appropriated specifically for genome 
projects, increasing to this level over the first 3 
years. In the first 5 years, this might be spent to 
fund work at 10 medium-sized multidisciplinary 
centers and to support a program of grants to 
many more smaU research groups. An estimated 
1,200 scientists would be involved, with roughly 
half located at the multidisciplinary centers. The 
research component would account for $120 mfl- 
lion per year. The remainder of the budget would 
be used for construction ($55 million per year ini- 
tially, decreasing in later years) and to pay for the 
repository; database, quality control, and admin- 
istrative functions of the scientific advisory com- 
mittee ($25 million per year). The funding for con- 
struction in early years would be reassigned to 
production of maps and sequence data as tech- 
nologies matured. 

A majority of the committee recommended that 
a single agency be designated and given funding 
to lead the effort. Other options were also dis- 
cussed; including an interagency structure much 
like the task force option discussed in the next 
chapter. A final option was to have an interagency 
body for planning and funding, but a single agency 
for administration. 



PRIVATE CORPORATIONS 



Private corporations in several fields have ex- 
pertise relevant to mapping and sequencing the 
human genome. Many instruments first developed 



in academic or national laboratories are now 
produced commercially. Pharmaceutical and bio- 
technology companies could use map and se- 
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quence information to develop new products, and 
many are themselves developir^g research tech- 
niques. Companies that market scientific instru- 
ments are also keenly interested. The role of the 
private sector appears to be primarily to: 

* advise in planning the mapping and sequenc- 
ing research program, 

* commercialize products that result from the 
research; and 

* ensure that technology transfer from feder- 
alty funded research projects to commercially 
exploitable products is smooth and rapid. 

Private corporations view human genome proj- 
ects, with a few exceptions, as long-term research 
that is best supported by the Federal Government. 
Corporations are unlikely to lend financial sup- 
port to a national program to map and sequence 
the human genome, although they might well in- 
vest in particular projects that involve develop- 
ment nf technology. Private firms could perform 
8peL«.ied functions under contract from the Fed- 
eral Government (e.g., genetic mapping, physical 
mapping, DNA preparation, or DNA sequencing) 
once the technologies are available. 

Several American companies already produce 
ONA sequenators and other analytical instruments 
used in mapping and sequencing projects. The 
Lawrence Livermore National laboratory group, 
which is constructing ordered iets of DNA clones 
under DOE sponsorship, is modifying an instru- 
ment initially designed for DNA sequencing by 
Applied Biosy stems of Foster City, California. Pri- 
vate corporations have likewise participated in 
buflding the existing genetic map of human chro- 
mosomes. Collaborative Research and Integrated 
Genetics are two companies based in the Boston 
area that have contributed substantially to the ef- 
fort to And new DNA markers and to link those 
markers to human diseases. A few biotechnology 
companies have been at the forefront in develop- 
ing automated technologies for handling DNA. The 
Genetics Institute (Cambridge, Massachusetts), for 
example, develoi)ed a robotic system that extracts 
DNA from bacteria and cells. 

At least two companies— the Genome Corp. in 
Boston and SeQ, Ltd., in Cohasset, Massachusetts 
--are being started specifically to map and se- 
quence the human genome. These companies plan 



to construct a physical map and subsequently se- 
quence the human genome over the next decade, 
using private funds. They would offer access to 
the materials and to the map and sequence infor- 
mation for a price. The process would be much 
like that used currently by researchers, who pay 
repositories for DNA clones, probes, and vectors 
or who pay companies for enzymes and other ma- 
terials used in molecular biology. The argument 
behind this is that, while each laboratory could 
conceivably develop the information independ- 
ently, it is cheaper and faster simply to buy it from 
a private firm thai has developed it already. Those 
purchasing the information would be free to use 
it, but not to copy or sell it. 

Private corporations could also play a role in 
the development of technology related to map- 
ping and sequencing. This could include company 
access to government facilities, exchange of cor- 
porate and academic personnel, multicompany 
consortia, individual corporate agreements with 
universities or national laboratories, or some com- 
bination of these. 

Members of the Industrial Biotechnology Asso- 
ciation were recently polled regarding their sup- 
port of Federal initiatives in mapping and sequenc- 
ing the human genome. Those responding 
indicated that: 

• The work should be funded entirely by the 
Federal Government and should not interfere 
with ongoing biomedical research. 

• NIH, DOE, and NSF should all participate (NIH 
should take the lead). 

• A national planning committee, composed of 
50 percent university scientists, 30 percent 
government representatives, and 20 percent 
industry representatives, should be set up. 

• Work should be carried out at dispersed 
university and federally supported labora- 
tories, not a center created for the purpose. 

• International cooperation should be en- 
couraged if it does not entail delays. 

• Physical mapping should pre ::ede sequencing. 

Respondents clearly support a role for industry 
in planning and using the results of mapping and 
sequencing projects, while indicating that the Fed- 
eral Government should pay the bill. 
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PRIVATE FOUNDATIONS 



Several private foundations support research 
in human genetics. These include such disease- 
oriented foundations as the March of Dimes; the 
Hereditary Disease Foundation; the Muscular Dys- 
trophy Association; and the Cystic Fibrosis Foun- 
dation. Other foundations support work on hu- 
man genetics as part of a broader research 
program; among them the American Cancer So- 
ciety; the American Heart Association; and the Alz- 



heimer's Disease and Related Disorders Associa- 
tion. These foundations; while relatively small in 
total funding; often act as catalysts in focusing 
research on a problem of particular interest. They 
are also highly effective at publicizing research 
results; educating the public about the conse- 
quences of disease; and generating public support 
for biomedical research. 



NIH; DOE; HHMI; and NSF have already made 
substantia] commiunenis to projects related to the 
study of the human genome. Government activi- 
ties are currently being coordinated by informal 
communication among the agencies. A previous 
coordinating group under the Domestic Policy 
Council wiD likely be replaced by one organized 
under the Office of Science and Technology Pol- 
icy in the White House; with budget submissions 
coordinated by the Office of Management and 
Budget. Each research agency has its own means 
of funding research and providing peer review 
of programs. NIH and DOE have created special 
planning groups to review genome projects. NIH 
funding is over $313 million each year for human 
and nonhuman research involving mapping or se- 
quencing. In 1987; NIH announced two new pro- 
grams in methods development; it has budgeted 
$17.2 million for those projects in 1988 and re- 
quested $28 million for 1989. 



In 1987; DOE allocated $4.2 million for 10 
projects on physical mapping and technology de- 
vetopment. It plans another $12 million in 1988 
and has received recommendations from an out- 
side scientific panel to ask for over $1 billion over 
the following 7 years. The former director of 
OHER stated that at least half the funds would 
be distributed to researchers at universities and 
research centers other than national laboratories 
(3) and that the work will be reviewed prospec- 
tively and retrospectively by peers. DOE officials 
have stated that their budget requests will be more 
modest than those recommended. 
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NSF spent over $32.7 million on research re- 
lated to genome projects in 1987; although only 
$200;000 was considered to be for genome 
projects per se. NSF's new Biological Centers Pro- 
gram is likely to be relevant to genome project^ 
particulai ly those involving new instrumentation. 
HHMI funded $40 million of genetics research in 
1987; including several million for construction 
of genetic maps. HHMI also administers and funds 
a genomics resource program of $2 million annu- 
ally to suppor . databases and other elements of 
the research infrastructure. 

To date; actions of the principal organizations 
can be described as cooperative. NIH and DOE 
have supported many joint efforts related to hu- 
man genome projects. The GenBank® database 
has been administered by NIH and located at a 
DOE-supported national laboratory for over 5 
yearS; and the two agencies also jointly support 
DNA clone and probe repositories; computer anal- 
ysis methods; and flow-sorting facilities. NIH and 
DOE sponsored a meeting on database and repos- 
itory needs of human genome projects in August 
1987; and there has been an exchange of project 
officers and extensive informal cooperation among 
staff at NIH; DOE; NSF; and other executive 
agencies. 

The strengths of NIH and DOE are more com- 
plementary than competitive. Each believes it 
could successfully mount and sustain the scien- 
tific and technical effort necessary for the con- 
templated mapping and sequencing projects. Both 
support relevant work already; although with 
different emphases. A decision to delegate the en- 
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tire effort to one agency would require that cur- 
rent efforts in other agencies be shut down. 

The mapping and sequencing effort will more 
likely continue to include NIH; DOE; NSF; HHMI; 
and other agencies and organizations covered in 
this chapter The key question then becomes how 
much and which part each agency should per- 
form. Such decisions will be made in a general 
sense by Congress ; through authorization and ap- 



propriation, with more detailed planning left to 
executive agencies. There may be informal coop- 
eration or some more formal means of coordinat- 
ing the planning and execution of agency projects 
under OSTP. Joint nongovernment advisory 
groups could be formed to bring in expertise from 
academia and industry. There are many options 
for organizing an interagency effort and for in- 
corporating outside advice into research planning. 
These issues of organization and advisory struc- 
ture are discussed in chapter 6. 
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Chapter 6 

Organization of Projects 



"Organization is a means to an end rather than an end in itself. Such structure is 
a prerequisite to organizational health; but it is not health itself. The test of a health 
business is not the beauty, clarity, or perfection of its organization structure. It is the 
performance of people.'' 

Peter Drucker, 

Management' Tasks, Responsibilities, Practices 
(New York. Harper &, Row, 1974), p, 602 



ADMINISTRATIVE STRUCTURES 



Chapter 5 presented the history, current involve- 
ment, and future plans of the many government 
and nongovernment parL s interested in genome 
research. This chapter assumes the continued in- 
terest and participation of the current actors, and 
it discusses the options for organizing those ac- 
tors at the Federal level. 

A properly designed administrative or organiza- 
tional structure for genomr projects is important. 
As form so perfectly ma ches function in DNA, 
so the organizational form of the project should 
match its function and goals. These include rapid 
accumulation of knowledge about the genome, 
efficient storage and distribution of that informa- 
tion, and conversion of this knowledge into 
productive theories, tools, reference materials, 
and medicines. The political consequences of a 
poorly administered group of projects are not only 
failure to achieve potential intellectual and eco- 
nomic contributions, but also negative impacts on 
the organization and funding of other scientific 
investigations. A genome project blueprint can- 
not be drawn without taking into consideration 
the abutting structures as well as the internal con- 
straints. 

Three major funding agencies must be included 
in any consideration of organizational design: the 
National Institutes of Health (NIH), the Department 
of Energy (DOE), and the National Science Foun- 
dation (NSF). Nongovernmental bodies such as the 
National Academy of Sciences (NAS) and the How- 
ard Hughes Medical Institute (HHMI) are already 
participating in organizational and advisory roles, 
md commercial firms anxious for sequencing 
echnology and data seek input as well. 
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There are at least five possible administrative 
structures a human genome project could develop: 

• One agency --a project performed exclusively 
by one of the expert agencies. 

• Si/^^/e-agfe/icy/eadership— a project in which 
Congress would designate one agency to co- 
ordinate and oversee the research. 

• Interagency agreement and consultation— a 
cooperative project among the agencies in 
which no additional authority structure 
would be created. 

• Interagency task force— a project in which 
a committee with the authority to direct re- 
search planning among the agencies would 
be chartered. 

• Consortium— a project in which the private 
sector as well as the Federal Government 
would plan research, with possible cofund- 
ing from the corporate partners. 

The first alternative, a project organized and 
executed solely by one agency, may be dismissed 
as unnecessary and politically unworkable. A 
single -agency project could only result from cut- 
ting out others, and several agencies have already 
made substantial investments in genome research 
and related technologies. Further, the current ge- 
nome infrastructure, including GenBank® and 
DNA clone repositories, is already interagency. 

The other four proposals have unique strengths 
and weaknesses. For any of them to be success- 
ful, however, the administrative structure must 
at least organize communications at the scientific, 
interagency, and international levels. At most, it 
should be capable of planning a research program 
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involving many partners and funding them ac- 
cordingly. Congressional decisions on the organiza- 
tional structure can be based on perceptions of 
the necessary patterns of authority, of quality and 
scope of exf)erience in research and development, 
and of fiscal and economic priorities. 

Single-Agency Leadership 

One possible beginning for genome projects 
would be the designation by Congress of a lead 
agency to coordinate ongoing activities in various 
agencies (see figure 6-1). This option was the one 
favored by a majority ol those on the National Re- 
search Council committee that issued a report on 
mapping and sequencing the human genome (18). 
The strengths of this organizational option derive 
from its clear designation of authority. Such 
leadership can be dynamic, and research would 
^ follow the theme established by the lead agency. 
A lead agency, and thus a lead administrator, fo- 
cuses the project in all its aspects: It provides a 
communications link among researchers, domestic 
and foreign; a contact for media; and a target of 
criticism and politicking. Drawbacks to designat- 
ing a lead agency are the possibility of incomplete 
commitment by the lead agency and the poten- 
tial inability of the lead agency to command the 
resources of other agencies effectively. Choosing 
this option would necessarily entail choosing 
which agency should lead. 

Among NIH, DOE, and N3F— the three funding 
agencies— NIH and DOE are the most appropri- 
ate candidates to lead a genome project. NSF is 
an unlikely leader because its mandate excludes 
the investigation of human health and disease, the 
ultimate focus of the projects. Further, a large- 
scale operation conducted by NSF would inevita- 
bly detract from other research of which NSF may 

Figure 6-1.— Lead Agency 




be the sole patron. Such a loss would occur in 
each of the funding agencies involved, but NSF 
may be most sensitive because it funds much less 
biology than NIH, and the research it supports 
is more basic as a rule than that supported by 
DOE or NIH (30). NSF can contribute to genome 
projects by stimulating interest in automation and 
robotics (which it has done), in animal models of 
human disease (by gathering animal and micro- 
organism sequence data for comparison), and in 
instrumentation (through its biology centers). 

Choosing between NIH and DOE is troublesome 
because these agencies have complementary 
strengths and weaknesses. The project would have 
a different face with different leadership. 

Because of its mandate to support investigattons 
to improve the Nation's health, NIH dominates bio- 
medical research. The institutes spent an esti- 
mated $313 million in 1987 for projects that in- 
volved mapping or sequencing, over $90 million 
of which funded projects to characterize the hu- 
man genome (13,15). NIH has conducted, funded, 
and administered genetics research expertly for 
years, and the institutes would seem to be a natu- 
ral home for genome projects. The theme of an 
NIH -led project would likely be a renewed com- 
mitment to the peer review system and to small, 
or cottage industry, science, with some added at- 
tention to the research infrastructure. 

One great strength of NIH is its decentralized 
administration: Quality projects uninteresting to 
one institute may well be funded by another. This 
flexibility is achieved at a cost, however. Critics 
have scrutinized this process and concluded that; 
among other faults, it cannot support a large, 
directed project (28). NIH leadership can have dif- 
ficulty imposing the priority decisions needed for 
a concerted effort. A distinction is often made be- 
tween the operating styles of NIH and NASA (the 
National Aeronautics and Space Administration), 
with NASA having much greater central author- 
ity and NIH exemplifying a decentralized process 
for setting priorities . To manage some of the pro- 
posed genome projects, mechanisms beyond NIH's 
standard researcher-originated format may be re- 
quired . This could "require a change in NIH's phil- 
osophical outlook and its approach,'' according 
to George Cahill of HHMI (23). 
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NIH could conduct a directed research program . 
n has done so in the past for the study of particu- 
lar diseases (e.g., polio and cancer) and is now do- 
ing so for AIDS. While NIH has not previously 
mounted a major project to develop a set of tools 
for biology (as mapping and sequencing projects 
are often characterized); it has the funding mech- 
anisms and expertise necessary to do so. The map- 
ping and sequencing projects have been described 
as a library of information awaiting translation, 
and NIH administers the National Library of Medi- 
cine (see ch. 5). To facilitate special genome 
projects; NIH has created new study sections to 
review grants that focus on methods; NIH could 
also convene a new scientific advisory body in an 
existing institute to direct a focused project; set 
aside funds for special projects in one or more 
institutes; and begin new centers or multidiscipli- 
nary programs analogous to existing ones. It al- 
ready has a multi-institute coordinating body to 
develop special initiatives like those announced 
in May and October 1987 for analyzing complex 
genomes and for informatics in molecular biol- 
ogy. NIH is currently in the process of establish- 
ing a mechanism for obtaining outside advice. 

The high capital needs in some areaS; the di- 
verse expertise (extending beyond biomedical re- 
search) needed on some research teamS; and the 
standardized and repetitive work of mapping and 
sequencing may render smaU resea^xh groups un- 
able or unwilling to do such work (8). Experience 
at the National Cancer Institute with the Special 
Virus Cancer Program has suggested that the 
standard grant mechanism is insufficient for such 
tasks as the production of standardized toolS; the 
distribution of clinical materials; and the increased 
coordination of investigators (33). If the institutes 
were to assign high priority to genome projects; 
those projects could conflict with other major re- 
search efforts; for example research on AIDS. 
Some persons; among them Ruth Kirschstein; Di- 
rector of the National Institute of General Medi- 
cal Sciences; have questioned whether "it would 
be appropriate to have a specifically targeted pro- 
gram that would compete with all the extraor- 
dinarily important programs NIH funds" (23). The 
danger is that a targeted program would become 
an instead-of program rather than an in-addition- 
to program; as was the case with the Special Virus 
Cancer Program (33). 



As a lead agency; DOE would endow the genome 
project with different characteristics of organiza- 
tion and expertise. DOE has long supported re- 
search on human mutations and DN A damage and 
repair through the Office of Health and Environ- 
mental Research. The mission of OHER is to un- 
derstand the effects of radiation and other means 
of energy generation on human health and the 
environment. OHER views ignorance of the ge- 
nome and the inability to sequence and analyze 
DNA rapidly as major limitations on its research. 
As NIH might emphasize the human disease as- 
pects of genome research; DOE would emphasize 
the investigation of mutagenesis and other areas 
closely related to OHER's mission. Critics have 
characterized OHER's rationale as "clearly imprac- 
tical" (17) and "forced and . . . disingenuous'M30). 
But because of OHER's interest in human genetic 
material; DOE already has established expertise 
in crucial technologies such as automated chro- 
mosome and cdl sorting; and in the computer stor- 
age of genetic data. DOE believes that; through 
its national laboratory structure; it should develop 
methods and tools useful to the entire commu- 
nity of molecular biologists (27). 

The strengths and weaknesses of DOE are 
largely complementary to those of NW DOE's 
strength is its familiarity with the administration 
of focused research programs. It manages many 
large facilitias for research in physics and chem- 
istry—such as accelerators for high-energy 
physics— and the scientists whom DOE funds in 
these areas are among the best in the world. DOE 
also maintains excellent computing resources. Yet 
DOE does not have the same stature within the 
community of molecular biologists that NIH does 

(11;16;24;30). 

The national laboratories have long provided 
services to the community of molecular biologists 
that are not provided by other agencies. The na- 
tional laboratories have pioneered many high- 
technology instruments useful in biology: zonal 
centrifuges; high-pressure liquid chromatography, 
fluorescence-activated cell sorters; and chromo- 
some sorting. Teams at national laboratories have 
prepared sets of DNA clones from individual hu- 
man chromosomes; and current mapping projects 
are logical extensions of this work. Even though 
the national laboratories are not renowned for 
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their expertise in molecular biology, some of the 
technology, anal3^cal software, and new meth- 
ods that need to be developed will not be in bio- 
logical disciplines— they will involve engineering, 
physics, and mathematics, all areas of acknowl- 
edged national laboratory expertise. 

DOE enjoys the reputation of being a proficient 
organizer of projects among government, univer- 
sity, and industry researchers. As an agency, it 
is experienced in managing large projects and dis- 
bursing large sums of money, extramurally to 
universities and research centers and intramurally 
to the national laboratories. Critics fear that, if 
DOE assumes leadership of genome projects, its 
bias toward central management wiU corrupt re- 
search and stifle the more traditional, perhaps 
more creative, cottage industry approach. 

DOE's review process for genome projects 
would involve prospective and retrospective peer 
review . The degree of scrutiny is not likely to differ 
substantially from that at NIH. Funding through 
DOE would be less likely to sap other biomedical 
funds, but other biological research programs at 
DOE could suffer. Designating DOE as the lead 
agency would give the organizational lead to an 
agency that supix)rts only a small fraction of re- 
lated research and thus only a small fraction of 
the user community. Having DOE administratively 
lead all genome projects could prove unmanage- 
able in the long term. 

The controversy over which agency should lead 
—DOE or NIH— may be misguided. E^ch agency 
has a role to play, and discussion r.hould focus 
instead on how to encourage coopei ation and to 
ensure that the research program of one agency 
does not inhibit that of the other. One observer 
has asserted that a major directed program at NIH 
alone would soon be politically incorporated into 
the overall NIH budget and would thereafter dis- 
place untargeted research. The corollary is that 
"DOE could Hnd the leadership excellence more 
easUy than NIH could provide the budgetary in- 
sulation'' (14). Nonetheless, NIH is the logical choice 
for lead agency if Congress chooses to designate 
one— its mission is most directly affected, and the 
scientific community now supported by NIH is by 
far the largest of the intended beneficiaries of ge- 
nome projects. If NIH leads, then the expertise 
and multidisciplinar}^ research already supported 
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by NSF and DOE should be explicitly taken into 
account in future planning. Difficulties in desig- 
nating a lead agency are discussed under options 
for action by Congress in chapter 1. 

Interagency Agreement and 
Consultation 

The lack of a lead agency implies no favored 
research strategy or funding mechanism, but a 
balanced program to take advantage of NIH, DOE, 
NSF, and other agencies' strengths. Agencies could 
be left to themselves to cooperate and communi- 
cate among themselves and with other interested 
organizations in the United States and abroad (see 
figure 6-2). A group of agency principals— agreed 
to by the agencies or under the Office of Science 
and Technology Policy— could meet to achieve 
these goals and exchange details of research direc- 
tions and developments. 

An interagency agreement and consultation 
framework eschews any formal creation of au- 
thority and relies on the good will of the partici- 
pants to exchange information freely. Such an ar- 
rangement allows each agency autonomous, and 
presumably efficient, use of its resources and per- 
mits each agency to address those research topics 
most closely associated with its institutional in- 
terest. Interest may not always correspond to ex- 
pertise, however, and it may conflict with or over- 
lap other agencies' programs. This would act 
against one ostensible goal of the cooperative 
effort— to streamline projects by eliminating un- 
necessary duplication of research. An informal 
or ad hoc framework may also be inappropriate 
for very expensive^ long-term projects because 
evolving and potentially diverging priorities may 
diminish rapport among the agencies. 

A communications and consultation committee 
could be responsible for these cooperative, corn- 
Figure 6-2.— Interagency Agreement and Consultation 
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munications; and streamlining functions; but the 
institutional focus would be scattered and the 
project would exist without clear leadership. Al- 
though in the best scenario such a committee 
would be completely abreast of all the domestic 
research; it might be too diffuse a body to sup- 
port international organization of a project. 

Decentralized authority is not without benefits; 
however; for pluralism of funding sources and 
flexible, decentralized organization are strengths 
of American science. Genome projects may be 
compelling enough to turn the organizational 
gears without creating a special bureaucracy for 
the task. A cooperative effort also permits a flexi- 
ble mix of funding options; and each agency would 
retain control over its research planning. 

The subcommittee on the human genome of the 
Biotechnology Working Group of the Domestic 
Policy CouncU acted as "a mechanism for exchang- 
ing information . . . [with] the right people at the 
right level;" according to David Kingsbury of NSF 
(23). This coordinating group will now be located 
under a life sciences committee at the Office of 
Science and Technology Policy. Such a group of 
government administrators facilitates interagency 
communication but may not address other needs. 
A purely government body is open to the criti- 
cism that scientists and not government adminis- 
trators must provide direction or at least partici- 
pate directly in planning (31). This conflict is 
similar to that found in creating an advisory body; 
which generally reflects the question of how much 
influence scientists should have on the science pol- 
icy process (discussed below). 

The merit of informal agreement and consulta- 
tion is that each agency would have the flexibility 
to follow its own research agenda. Agreement and 
consultation would not require legislation by Con- 
gress and would make interagency cooperation 
a matter of congressional oversight. A disadvan- 
tage is that flexibility may be achieved at the cost 
of clear authority and accountability. Further; 
there might be no mechanism for resolving con- 
flicts among agencies. The app* opriateness of in- 
formal interagency cooperation turns on a judg- 
ment of which is more efficient— a directed and 
planned effort or a pluralistic and decentralized 
process. 



Interagency Task Force 

A genome initiative might require more active 
leadership than that described above. An inter- 
agency task force dedicated to pursuing the ge- 
nome project and wielding some authority over 
fundirig and research might provide such leader- 
ship (see figure 6-3). 

The task force could be constituted much like 
an interagency committee; with principals from 
the participating agencies; however; the task force 
would possess authority in certain areas, such as 
gathering of information from participating agen- 
cies; preparation of reports; formulation of rec- 
ommendations; and interagency planning. It could 
design and direct a genome project, drawing on 
each of the participating organizations (see box 
6-A). 

A task force would be much like a lead agency 
in its ability to draw the attention of foreign re- 
searchers, the media; and domestic political in- 
terests. And like a lead agency; the task force 
would present a central character— its chairper- 
son —who would act as spokesperson for the proj- 
ect. If the chairperson of the task force were 
selected from the agency representatives; how- 
even the appointment would likely carry with it 
the same kind of political difficulties as selecting 
a lead agency. 

Establishing a functional authority rnay require 
substantial political investment; but the cost of 
subsequent decisions is negligible because they 
can be immediate and final. With a committee 
authorized only to facilitate communication, a 
dilemma in the assignment of a particular research 
project; for example, could be costly in any num- 
ber of ways, from the time it takes to reach a co- 
operative solution to the money required to dupli- 
cate the research should no equitable distribution 

Figure 6-3.— Interagency Task Force 
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Box 6-A.— Acid Precipitation Task Force 



The Acid Precipitation Act of 1980 (Public Law 
96-294), Title VII of the Energy Security Act, estab- 
lished a 10-year program to reduce or eliminate the 
sources of acid precipitation. To implement this pro- 
gram. Congress mandated the formation of the Acid 
Precipitation Task Force, composed of members 
from the national energy laboratories, the agencic^s, 
and four presidential appointees, and chaired jointly 
by representatives from the National Oceanic and 
Atmospheric Administration (NOAA), the U.S. De- 
partment of Agriculture (USDA), and the Environ- 
mental Protection Agency (EPA). The task force is 
thus a truly interagency body, drawing on a vari- 
ety of agency expertise for leadership. 

The legislative history describes the task force as 
being charged with preparing a comprehensive re- 
search plan, to include individual research, eco- 
nomic assessment, Federal coordination, interna- 
tional cooperation, and management requirements. 
The comprehensive plan is implemented and man- 
aged by the task force. The Acid Precipitation Task 
Force could thus serve as a model for an interagency 
task force dedicated to genome projects. 

In 1985, representatives from the various agen- 
cies signed a memorandum of understanding that 
fixed the structure for administering the act. The 
memorandum assigns authority and responsibility 
to: 1) a Joint Chairs Council, consisting of principals 
from USDA, DOE, EPA, NOAA, the Department of 
the Interior, and the Council for Environmental 
Quality, and responsible for approving the annual 
research program and the corresponding portions 
of the budgets of the participating agencies; 2) the 
task force, to review the annual research program 
and budget and to provide advice and recommen- 
dations to the council; 3) the Director of Research 
(appointed by the Joint Chairs Council), to formu- 
late the research program and budget; 4) the Inter- 
agency Scientific Committee and the Interagency 
Policy Committee, consisting of senior scientific and 
policy executives, respectively, from the agencies, 
to advise and recommend; 5) an External Scientific 
Review Panel; 6) the Office of the Director of Re- 
search, consisting of scientists and support staff; 
and 7) research task groups, each under the lead 
of a specific agency, to develop a research plan and 
budget for a particular task. 

SOURCE* Office of Technology Assessment, 1987, based in part on V S Congress, Genera! Accounting Office, Acid Rain PeUvs and Management Changes 
in the Feders! Research Program, GAG Pub RCED-87-89 (Washington, DC GAG, 1987) 



Shortly after the 1985 reorganization, tho Gen- 
eral Accounting Office reviewed the program at the 
request of Congress, becdvse management changes 
and delays in T'Ciiurting had become constant. The 
oenerai Accounting Office's recommendations are 
more functional than structural, and they relate to 
the difficulty of issuing public reports under great 
scientific uncertainty. The almost intractable nature 
of some of the acid precipitation problems is appar- 
ently issue-specific and not related to the organiza- 
tion of the project. 

The authority under the new organization is sig- 
nificantly vested in the Joint Chairs Council, the Di- 
rector of Research, and divided between a scien- 
tific column and a policy column. The fine structure 
is fascinating: It attempts to permit each participat- 
ing agency to retain authority over its research ex- 
pertise by creating research task groups. For ex- 
ample. Interior is responsible for monitoring 
deposition, NOAA for atmos;^heric processes, and 
DOE for emissions and control technology. In ge- 
nome projects, distribution according to expertise 
would have NIH focus on mapping techniques and 
biological technologies, and DOE focus on automa- 
tion and robotics and computation. This could be 
useful as long as it did not assign tasks to the wrong 
agency and did not inhibit flexible interagency plan- 
ning for areas of legitimate overlap. The agencies 
participating in the Acid Precipitation Task Force 
are working on a scale similar in magnitude to that 
of a genome project; from fiscal years 1982 to 1987, 
the agencies spent just over $300 millic/. for acid 
precipitatioi i esearch . 

The joint chair arrangement, among NIH, DOE, 
and NSF in a genome project, would represent a 
smooth distribution of authority. The appointment 
of a director of research might prove the only bone 
of contention, as the selection might imply what 
style of research— small-group science v. Big Sci- 
ence—is to be funded. The Acid Precipitation Task 
Force also balances the concerns of policy specialists 
with those of scientists and seeks the input of 
nonagency scientists as well (it does neglect non- 
agency policy specialists, however). This inter- 
agency task force approach attempts to combine 
the dynamic properties of an authoritative leader 
with the efficiency of agencies pursuing their own 
research expertise. 
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be achieved. A task force or lead agency could 
eliminate some of this cost. 

A task force may not be able to match efficiency 
in decision making with efficiency in administer- 
ing the agencies' resources. Its recommendations 
could be ignored by agencies, or it could prove 
an obstruction or source of delays. A task force 
is a bureaucratic solution, identifying a person 
or group with the goal of genome analysis and 
building upon the existing authority structure. 
Such authority may be necessary to direct re- 
search and provide a focus for international com- 
munications, but it adds another layer to the 
bureaucracy that separates the administrators of 
science from the investigators. 

Consortium 

Like the task force approach, a consortium 
would involve the creation of a new authorita- 
tive entity. Unlike the organizational structures 
discussed previously, however, this approach 
would require the active participation of private 
firms (see figure 6-4). The introduction of this new 
factor complicates the staging of a genome project. 

The typical consortium is a close working asso- 
ciation between a university research group and 
one or more private firms interested in the pur- 
suit and economic development of that research. 
Government involvement in consortia is often 
limited to financial support during the initial stages 
of basic research, while industry waits to fund 
the development stage. State governmen; is fre- 
quently more active than Federal, because the 
projects are perceived to be closely linked to lo- 
cal economic development (see box 6-B). 



Figure 6-4.— Consortium 




A consortium of universities, businesses, and 
government is directed toward several mutually 
enriching goals: strengthening universities, stim- 
ulating (competitive) economic growth, engaging 
in basic research, creating generic technologies, 
and developing and delivering specific products 
(6). These goals correspond to those of some ge- 
nome projects, which some persons hope will 
maintain America's competitive position in bio- 
technology against challenges from Asia and Eur- 
ope. Genome projects would, for example, seek 
both to create generic biotechnology tools (such 
as techniques for handling very large DNA frag- 
ments, detecting very small amounts of DNA, and 
designing software for analysis) and to develop 
specific pi oducts (such as vectors for cloning DNA 
or automated DNA sequencers). Such results 
would benefit university researchers and cor- 
porate investors alike. 

The question of ^ietting the research agenda (in 
other instances the responsibility of the individ- 
uals conducting the research and the agencies 
funding it or of the task force established to over- 
see it) is complicated by private firms' need to em- 
phasize technology development and not neces- 
sarily free inquiry. Profit-seeking firms often have 
shorter-term goals than are practical for the sup- 
port of basic research and therefore focus on the 
development of short-term technologies over long- 
term ones. Not all industries have a short-term 
perspective, however: Pharmaceutical firms are 
accustomed to basic research and long-term pay- 
offs—investments requiring more than a decade 
to bear fruit. 

A private emphasis on development is closely 
associated with the effective transfer of technol- 
ogy into the marketplace. A consortium would 
no doubt speed tachnology transfer to participat- 
ing firms, but firms might suppress the spread 
of scientific information to protect their invest- 
ment (5). The phenomenon of sitting on data is 
not restricted to industry— academic scientists 
may delay dissemination of information in order 
to consolidate results for their own financial or 
reputational benefit (12)— but proprietary inter- 
est will not help the free exchange of data. Thus, 
the question of proprietary rights versus infor- 
mation in the public domain is a sticky one for 
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genome projects, where a naturally occurring 
DNA sequence Can translate into a multi-million- 
dollar product. Thus the presence of comm3rcial 
firms in academtd :« two-sided: Goals of technol- 
ogy transfer and economic development may be 
more easily reached, but the control exerted by 
industry over Jie planning of the research agenda 
and the dissemination of results might be too self- 
interested. Concern that the economic aspirations 



of private firms might corrupt the atmosphere 
of academia may be overstated now, since only 
a few businesses have shown any interest in the 
genome project; but the possibilities of reaping 
economic benefits, especially in the current envi- 
ronment of international competitiveness, are 
likely to attract more private sector involvement 
in the future. 



Box 6-B.— Midwest Plant Biotechnology Consortium 

Representing a large number of university, industry, and government partners, the Midwest Plant Bio- 
technology Consortium is an experiment in basic research and technology transfer among agricultural sec- 
tors. Its purpose is to increase the competitiveness of American agriculture and agribusiness through the 
development of basic plant biotechnology research. 

The idea for the consortium began at DOE's Argonne National Laboratory (AND, which has a historical 
research interest in photochemistry and photos3mthesis. ANL determined that a coordinated program in 
plant science could contribute to biotechnology applications of interest to both industry and government 
agencies. 

When ANL invited participation from universities end industry, it specified a number of principles that 
would guide the consortium. The continued importance of both industrial and scientific peer review proc- 
esses was stressed; and the intellectual property rights were established from the outset. ANL also empha- 
sized the regional nature of the consortium, encouraging the participation of Midwest institutions to inves- 
tigate for Midwest agnbusiness. Aside from these inidal guidelines, the original organization of "he consortium 
remained informal until recently, when it sought incorporation as a 501(c)(3) (tax-exeTipt) ^corporation and 
developed a more formal budget process. 

Government interest in the consortium comes from those agencies involved in the genome discussion— 
DOE, NSF, and NIH— with the addition of USDA. A secretariat operates the consortium, determining policy 
and procedure with informal involvement of government officials. More formal arrangements may be pos- 
sible in the future. An executive board of corporate and university officers oversees the technical and 
administrative operations. A number of research topic subgroups (e.g., plant growth, pesticides-herbicides) 
also exists, and the primary interaction for technology transfer occurs at this level. 

The consortium solves the problem of research direction by a two-tiered system: The industrial part- 
ners first select proposals on the basis of commercial potential, and then a peer review system selects on 
the basis of technical merit. The consortium expects the Federal Government (with some State funds) to 
support research through the initial stages, that is, until the industrial partners can see around the develop- 
ment corner to a commercial application. Research proposals developed as part of the consortium will 
be subjected to normal competitive grant review at the Federal agencies. Industry would then fund the 
final steps. 

Organizationally, the Midwest Plant Biotechnology Consortium offers a number of uueful parallels to 
a genome project. The Federal agencies overlap similarly, as dues DOE's attempt to link the research pro- 
gram to related resef^rch at the national laboratories. Intellectual property rights are sensitive in both projects; 
the consortium provided from the outset that each research participant would retain rights according to 
institutional policy and that the industrial participants would have the right to first disclosure. The consor- 
tium retainfi the integrity of the peer review s^'stem while allowing industry to set some of its own research 
priorities based on commercial potential. The parallels break down where some of the short-term commer- 
cial interest in a genome project focuses on automated tools and machinery in addition to the results of 
biotechnical manipulation. Funding for the consortium is also considerably less than what is expected to 
be necessary for a genome project. 

SOURCE; Office of TechiK>logy Assessment. 1987 
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If consortia related to one or more genome 
projects are formed; several issues will have to 
be resolved. First, terms of participation must en- 
able a broad spectrum of private firms to partici- 
pate. SmaU firms with limited resources have had 
difficulty, for example, in paying entry fees to some 
biotechnology consortia (22). And nonprofit organ- 
izations, which must make their information avail- 
able on a nondiscriminatory basis under U.S. tax 
laws, might have difficulty in participating if there 
are preferential terms for industrial partners. 

Discussion 

None of the four workable administrative 
structures— the lead agency approach, which re- 
qtiires a choice between DOE and NIH; the coop- 
erative approach, which requires no new legis- 
lation; the task force, which creates a formal 
authority; or the consortium, which adds a dose 
of private sector assistance— is static. Administra- 
tive forms may overlap: For example, the consor- 
tium may require a lead agency, or the coopera- 
tive effort may create consortia or task forces to 
attain specific objectives. The administrative struc 
ture at the national level does require explicit 
choices, however Congressional action will vary 
according to the option chosen. Interagency agree- 
ment and consultation would require no new leg- 
islation, only oversight. Designation of a lead 
agency, establishment of a task force, or creation 
of a single national consortium would require new 
legislation. 



Administration of genome projects will require 
monitoring of some central services and facilities, 
some services and functions performed at centers, 
and many grants to small groups. This raises sev- 
eral concerns about communication among agen- 
cies aiid among the scientists whose work they 
support. The diffusion of '^search among a large 
number of groups complicates communication, 
but it permits the most flexible organization of 
research; the investigator may be as focused or 
interdisciplinary as the research demands. A re- 
duction in the number of groups reduces the dif- 
ficulty of communication but limits the number 
of people trying wholly new approaches to the 
scientific or technical objective. The pace of in- 
novation may be directly proportional to the num- 
ber of groups: A commitment to a single center 
or institute might fix the relevant technology 
prematurely. When innovation is less important 
than production, then specialized facilities are log- 
ical because they simplify the organization of 
work. Problems of communication for centrally 
administered projects are of a different variety. 
Often the most difficult problem is ensuring that 
services are appropriate and tailored to the needs 
of those using them. Different genome projects 
will have different modes of communication. Proj- 
ects that rely on many small groups will need com- 
munication networks or frequent meetings of sci- 
entists; central services will require feedback from 
user communities. 



ADVISORY STRUCTURE 



Second to the administrative structure in or- 
ganizational hierarchy, though not in importance, 
is the structure of an appropriate advisory body 
or bodies. Agencies supporting genome projects 
will benefit from tapping the academic and indus- 
trial sectors for the requisite expert wisdom. Sim- 
ilarly, academia and industry wish to ensure their 
input into the decision-making process and to ex- 
ercise some control over the research that affects 
their livelihood. The responsibilities, composition, 
structure, and funding of advisory groups then 
become issues. 



Responsibilities 

The primary responsibility of the independent 
advisory board (or boards) would be to follow the 
research plan and budget envisioned by the agen- 
cies, task force, or consortium and to make rec- 
ommendations where appropriate Such recom- 
mendations might include identification of 
promising research initiatives in need of funding 
or ovo'sight of standards necessary to ensure qual- 
ity control. The board could be granted budget 
authority to enact these recommendations, or its 
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role could be strictly advisory. Consideration of 
broad overarching issues— such as the ethical im- 
plications of using some newly developed tech- 
nologies or the economic benefits of targeted tech- 
nology development— could also be a function of 
the board. 

The advisory board would naturally have a 
reporting duty: to the participating agencies, to 
Congress, to the public, and perhaps to the inter- 
national community of scientists. The advisory 
board would be an organ of communication 
among the agencies, supplementing their infor- 
mal direct contact. Congress would probably want 
to be kept abreast of research progress and could 
require periodic assessments in order to plan ge- 
nome projects and other research initiatives. An- 
nual or biannual reporting to Congress on prog- 
ress and the distribution of funds could be fit into 
the budget process, for this will be one way in 
which genome projects are held accountable to 
the taxpayers. The executive branch could be kept 
up to date by the advisory board or through the 
Office of Science and Technology Policy. An advi- 
sory board not romposed entirely of Federal 
officers would fau under the Federal Advisory 
Committee Act (Public Law 92-463). Pursuant to 
the act, the advisory board's meetings and papers 
must be open to the public. The advisory board 
could also be the contact for international com- 
munication. 

Composition 

An advisory board would require members with 
varied backgrounds. Scientists with experience 
in the planning of mapping and sequencing work 
would be needed for technical advice. Scientists 
with database expertise would also be required, 
as the storage and dissemination of the project's 
information is as central as the generation of it. 
Scientists could be chosen from universities, in- 
dustry, and federally supported laboratories. 
Choosing the board involves the same issues as 
the consortium decision: how much influence de- 
velopment- and profit-minded industry experts 
should have on the project. One suggestion, from 
an industrial association, is to set up an advisory 
board with 50 percent university, 30 percent gov- 
ernment, and 20 percent industry representatives 



(9). This would in fact be an extension of current 
practice, as university and industry representa- 
tives often work together productively. The selec- 
tion of scientists from abroad to serve on the advi- 
sory board, perhaps as nonvoting members, would 
help it assume an international role. 

Since the project's impact vvould extend into gen- 
eral science policy, economic competitiveness, 
medical care delivery, and the like, experts from 
such fields might be included. The board might 
want, for example, to ensure that other areas of 
biomedical research do not suffer from a drain 
of funds or personnel, and policy experts and 
economists would be helpful in this. Lawyers 
might be necessary to address questions of intel- 
lectual property. Ethicists might be included to 
help the board address such issues as confiden- 
tiality of data on research subjects or whether 
to investigate the chromosome containing disease 
gene A before that containing disease gene B. Rep- 
resentatives of interested private philanthropies, 
particularly those supporting research in human 
genetics, might also be included. An advisory 
board would logically include at least a represent- 
ative of the Howard Hughes Medical Institute, as 
it funds a substantial portion of genome projects. 

Structure and Funding 

Scientists and nonscientists could serve together 
on a single advisory body or on separate bodies. 
The choice will influence the method of research 
planning and science policy formation: In a sin- 
gle body, the procedure is multifaceted but 
essentially unitary; in separate bodies, the proce- 
dure is separated inlu scientific and policy com- 
ponents. Another possible division of advisors 
would be government representatives on one 
panel and private representatives, from academia, 
industry, and other backgrounds, on another. 

Appointments to a policy board could be made 
by the President, with the advice and consent of 
the Senate. The choice of members could be as- 
signed to a nongovernmental body, such as the 
National Academy of Sciences, to ensure the 
board's independence and its technical compe- 
tence. As an alternative, the task of selection could 
be delegated to the Office of Technology 
Assessment. 
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BIG SCIENCE V. SMALL-GROUP SCIENCE 



The likelihood that Big Science will invade 
molecular biology has often been cited in opposi- 
tion to a concerted government program of ge- 
nome projects. Small science is largely conceived 
and executed by a principal investigator direct- 
ing a small laboratory group funded by a grant. 
Big Science can refer to many things. It can mean 
large and expensive facilities. It can refer to large, 
multidisciplinary team efforts that entail cooper- 
ative planning and therefore require individual 
scientists to sacrifice some freedom in choosing 
goals and methods. Or it can refer to bureaucratic 
central management by government administra- 
tors. These different meanings have been inter- 
mingled in the emotionally charged debate about 
genome projects. (For further insight into that de- 
bate, see box 6-C.) 

Three lines of argument have been made against 
conducting molecular biology research on one of 
these Big Science models: style, efficiency, and po- 
litical interference. 

Displacement of Higher-Priority 
Science 

Some scientists worry that a major Federal pro- 
gram to map the human genome and sequence 
a significant portion of it would detract from the 
conduct of more important science (2,3,20). The 
argument is that special appropriations for hu- 
man genome projects could well go to projects 
that do not present the most immediate obstacles 
to scientific progress and might supplant funds 
that would be allocated differently by the peer 
review processes of scientific agencies. If genome 
projects were not of the same scientific caliber 
as projects in other areas of science, agencies 
would nonetheless be precluded from reassign- 
ing those funds. 

Other scientists argue that some genome proj- 
ects do not lend themselves easily to current re- 
view procedures and merit a special effort (7,10, 
19,25,30). Genome projects will involve not only 
science, they say, but also technology development 
and production. Some aver that existing peer re- 
view committees give short shrift to projects in- 
tended to develop methodology (as opposed to 



answering a scientific question) and tend to under- 
fund shared research resources. They believe that 
the value of genome projects warrants a special 
effort, including new peer review committees and 
increased resources for a research infrastructure. 

A related issue concerns the details of funding 
mechanisms. Those vho believe strongly in the 
superiority of investigator-initiated small-group 
research urge caution in supporting large projects 
that are administered by institutions rather than 
individuals. The agencies most directly involved— 
namely, NIH and DOE— are adopting policies that 
answer both arguments by promising to use a sys- 
tem of peer review that gives the scientific com- 
munity substantial power to direct genome proj- 
ects but thai differs from current peer review 
adding new review groups to focus on compo- 
nent genome projects. 

Style and Efficiency 

Some scientists have objected to a Big Science 
approach to genome projects because it goes 
against the tradition of science as a cottage in- 
dustry conducted by small, laigely autonomous 
groups. The underlying assumption is that Big Sci- 
ence management would undercut the motivation 
and circumscribe the freedom of investigators by 
making them beholden to administrators in a sci- 
entific bureaucracy. Yet team effort is likely to 
be cheaper and faster in the long run for genome 
projects that focus on developing instruments or 
producing maps. It would be unwise and waste- 
ful to shun all projects that do not conform to 
the small-group mode. One science administra- 
tor advised scientists that: 

. . . insofar as what they do is part of the war 
against human suffering, their desires and 
tastes are not all that matter. Biomedical sci- 
ence is not done, or, more important, is not 
supported by the public, simply because it 
gives intense satisfaction to the dedicated and 
successful biomedical researcher (32). 

Large and expensive projects must meet cer- 
tain criteria, otherwise they could indeed supplant 
other research. They must meet needs that can- 
not be met by small-group research (e.g., produc- 
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Box 6*C.— Quotes on Genome Controversies 



Proposals for genome piXyjects, particularly sequencing the human genome, have provoked consider- 
able controversy among luminaries in molecular biology and related disciplines. The following quotations 
illustrate the liveliness of the debate over the past 2 years. 

"Sequencing the human genome is like pursuing the holy grail." Walter Gilbert, Harvard University, 
at several national meetings, March 1986 to August 1987. 

"[Sequencing the genome now] is like Lewis and Clark going to the Pacific one millimeter at a time. 
If they had done that, they would still be looking." David Botstein, Whitehead Institute, Cold Spring Harbor 
Symposium on the Molecular Biology of Homo sapiens, June 1986. 

"Humans deserve a genetic linkage map. It is part of the description of Homo sapiens." Raymond White, 
Howard Hughes Medical Institute, University of Utah, in Science 233:158, 1986. 

"The idea is gaining momentum. I shiver at the thought." David Baltimore, Director, Whitehead Insti- 
tute, in Science 232:1600, 1986. 

"Of course we are interested in having the sequence, but the important question is the route we take 
to getting it." Maxine Singer, Director, Carnegie Institution of Washington, in Science 232:1600, 1986. 

"Sequencing the human genome would be about as useful as translating the complete works of 
Shakespeare into cuneiform, but not quite as feasible or as easy to interpret." James Walsh, University 
of Arizona, and Jon Marks, University of California, Davis, in Nature 322:590, 1986. 

"I believe such a conclusion [against special efforts to sequence the human genome] represents a faUure 
of vision, an unwarranted fear of (not very) 'big' science." Robert Sinsheimer, University of California, Santa 
Cruz, in Science 233:1246, 1986. 

"My plea is simply that we think about this project in light of what we already know about eukaryotic 
genetics and not set in motion a scientifically ill-advised Juggernaut." Joseph Gall, Carnegie Institution of 
Washington, in Science 233:1368, 1986. 

'Too bad that it needs such fancy wrappings to attract public attention for an obvious good." Joshua 
Lederberg, "The Gift Wrapped Gene," in The Scientist, Nov. 17, 1986, p. 12. 

'The sequence will give us a new window into human biology." Renatto Dulbecco, Salk Institute, inter- 
view with OTA staff member, January 1987. 

"Of course, if you have the clones, you're going to want to sequence them. The question is which ones 
to do first. I think it is scientifically arrogant to prejudge what will be important and what will not." Paul 
Berg, Stanford University, interview with OTA staff member, January 1987. 

"I'm surprised consenting adults have been caught in public talking about it [sequencing the genome] ... it 
makes no sense." Robert Weinberg, Whitehead Institute, in The New Scientist, Mar. 5, 1987, p. 35. 

"The sequence of the human genome would be perhaps the most powerful tool ever developed to ".a- 
plore the mysteries of human development and disease." Leroy Hood and Lloyd Smith, California Institute 
of Technology, in Issues in Science and Technology 3:37, 1987. 

'The main reason that research in other species is so strongly supported by Congress is its applicability 
to human beings. Therefore, the obvious answer as to whether the human genome should be sequenced 
is 'Yes. Why do you ask?' " Daniel Koshland, Editor, Science 236.505, 1987. 

'The real problem that faces us is not the cost of the Human Genome Program, but how to get it going, 
seeing both that the right people are in charge and that they work under an administrative umbrella that 
will not tolerate uncritical thinking and so will never promise more than the facts warrant." James D. Wat- 
son, Director's Report, Cold Spring Harbor Laboratories, September 1987. 

"We will see a new dawn of understanding about evolution and human origins, and totally new ap- 
proaches to old scientific questions." Allan Wilson, University of California, Berkeley, at a symposium for 
the director. National Institutes of Health, Nov. 3, 1987. 





tion, service, or targeted technology development). 
They must not merely be us.3ful; but fill critical 
resource gaps as well (4). These criteria are likely 
to be met by many databases, repositories, and 
mapping prqects. They have not yet been met by 
proposals to sequence the entire human genome. 

Some argue that, while it may appear that cer- 
tain projects are best conducted by large, multi- 
disciplinary teams, in the long run science pro- 
gresses faster if large, targeted projects are not 
begun (20). That is, small-group science is sc much 
niore productive in the long run that attempts to 
direct science will inevitably go astray. 

Similar debates preceded the approval of costly 
projects in other fields. Construction of cyclotrons 
and other particle accelerators was resisted by 
many physicists in the 1930s Heilbron and Kevles, 
see app. A], and space-based instruments were 
opposed by many astronomers in the 1960s (26). 
Yet these facilities permitted scientific advances 
that would otherwise have been impossible, and 
they were (and are) most often used by small re- 
search groups. The issue is not that expensive fa- 
cilities should not be built, but that they should 
address critical needs and be carefully planned. 

Politicization 

One way in which concerted projects are be- 
lieved to drift into inefficiency is through politi- 
cal interference. This can be on a small scale (hag- 
gling that impedes progress among members of 
a research team) or a large scale (e.g., pork barrel 
science at the national level). One scientist has ob- 
served, "a megaproject like sequencing the human 
genome is certain to increase the political control 
over scientific decisionmaking" (3), and the Amer- 
ican Society for Bioi;hemistry and Molecular Biol- 
ogy warns against "the establishment of one or 
a few large centers that are designed to map 
and/or sequenro the human genome'' (l). Large 
research institutions can drift once their missions 
have been accomplished, and it can be difficult 
to close down unproductive efforts (32). 

Molecular biology has been remarkably produc- 
tive for three decades without the management 
style of Big Science. In the recent inventory of 
275 Big Science facilities compiled by the House 



Committee on Science and Technology, none was 
biological (29). Yet some human genome projects, 
for example developing new instruments or pool- 
ing results from many different groups, will re- 
quire multidisciplinary teams concentrating on a 
technical problem. This situation is analogous in 
many ways to the situations faced earlier by other 
sciences in their transition to Big Science [Heil- 
bron and Kevles, see app. Al (32). It is difficult 
to imagine, for example, automating the steps in 
cloning DNA, sequencing it, or mapping it with- 
out combining optics, chemistry, physics, engi- 
neering, and electronics. If the end products of 
genome projects— materials and information— are 
to be reliable and used internationally, there must 
be quality control and standardization. 

Clearly, some important functions require cen- 
tral coordination or multidisciplinary team re- 
search, although not necessarily centralized ad- 
ministration; some tasks cannot be forced into the 
moldiDf small-group science. Technological devel- 
opments will determine the pace and extent to 
which Big Science becomes part of biological re- 
search. The question will be how to decide which 
prqects merit special effort and which do not. 
Decisions of several types will be necessary in con- 
ducting genome projects. The advantages of de- 
centralized planning must be balanced against the 
need for some centralized resources. The impor- 
tance of mapping, sequencing, and technology de- 
velopment must be compared to other research 
and services. Such decisions will require an admin- 
istrative structure to make them. 

Biomedical investigations are now, and in the 
foreseeable fut' ire wiU continue to be, conducted 
primarily by small groups, although Big Science 
facilities and services can amplify and complement 
them. Small groups will remain the principal 
means of studying physiology and disease. When 
new institutions are created for elements of hu- 
man genome projects, special attention must be 
paid to making results useful to small scientific 
groups. It would be ironic if genome projects 
starved small-group research efforts in order to 
create new tools. 

The costs of database, repository, and map proj- 
ects are not large relative to the costs of other 
biomedical research, so planned projects are un- 
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likely to have any measurable adverse impact on 
other research » Moreover, genome projects in- 
tended to bolster the research infrastructure 
should free funds for new work by making re- 



search faster and less costly. If genome projects 
threaten the health of small-group biomedical re- 
search ; then genome projects should take a back seat. 



SUMMARY 



The Howard Hughes Medical Institute recently 
issued a short report on efforts to map the hu- 
man genome; it observed: 

The sooner the entire genome is mapped and 
sequenced once and for all; the sooner scientists 
can get on with the real work of human biology: 
understanding what the genes do (21). 

Databases and repositories must be centrally 
administered; although not necessarily centrally 
located; in order to be wkiely accessible. Tech- 



nology will most likely determine whether anu 
when large facilities and coordinated administra- 
tion are necessary to conduct genome projects. 
If large facilities prove to be more efficient; this 
will not necessanly be incompatible with research 
by small groups; it could in fact enhance it. If; 
however; large facilities and centrally organized 
research programs threaten the lifeblood of bio- 
medical research— investigator -initiated grants- 
then the projects should be reevaluated and; if 
necessary; cut back. 
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Chapter 7 
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INTRODUCTION 



The expected health benefits of genome proj- 
ects—and their commercial potential— have at- 
tracted international as well as national attention. 
The United States is the clear leader in basic 
research, publishing more articles on mapping 
and sequencing than European or Asian nations 
(see figure 7-1, table 7-1). U.S. companies have also 
marketed more instruments for DNA research 
than any others (see ch. 2). Productivity in basic 
and applied research does not, however, guaran- 
tee the United States the lead in developing or 
producing commercial products and processes, 
nor does it ensure market competitiveness. Japan 
has also encouraged the commercial development 
of technologies associated with the mapping and 
sequencing of DNA. Countries such as Switzerland 
and West Germany ai^ home base for multination- 
al pharmaceutical and chemical companies that are 
poised to commercialize developing products. 
Some nations not supporting much basic genome 
research at present have strong biotechnology or 
high-technology resources and policies and might 



Figure 7-1. -Distribution of Publications In 
Human Gene Mapping and Sequencing 
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Compiled from a bibliometrfc analysis of literature on human 
gene mapping and sequencing conducted for the Office of 
Technology Assessment by Computer Horizone, Inc. (see 
apps. A and EJ. The differences between the annual percen- 
tages displayed and the total annual research (100%) can be 
attributed either to countries not Included in the listing or 
to the absence of sufficient bibliographic information to de- 
termine the country or region from which the publication 
originated. 

SOURCE Office of 1 .mology AsMssment, 1968. 



Table 7-1.-lntematlonal Distribution of Human Genome Research 
(percent of articles published annually on human gene maps or markers) 

^^^^ ^977 1978 1979 1980 1981 1982 1983 1 984 1985 1986~ 

United States • • 45% ^2% m ^3% ^2% io% ^i^o i3%~ 

^*P*" 2 2 3 3 4 5 4 4 4 5 

Western Europe 

Denmark 1311211111 

Federal Republic of 111 

Germany 542465«ii^i;K 

Netherlands 4 5 3 3 2 2 2 2 ^9 

united Kingdom f 9 9 6 9 10 I i§ i? 

74746 56545 

Other non-European countries 

Australia <1 12222221 5 

Canada 3334232344 

Eastern Europe and ^ j t ^ 

,U.S.S.R. 5 3 3 5 5 4 3 4 4 3 

5 4 6 4 4 5 3 3 5 3 

S"w<:;;riSiWr4::;''r;,S2d7n^ • ''l''''om..r,c.„.,y.l,conduc.«.,or.h.OTAbyCompu..rHorlzon..i„cTh. 
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be well positioned to commercialize technologies 
that are developed for and spun off from human 
f^enome research.' The OTA has found that 
Government agencies in the United States are fur- 
ther along in developing policies for genome 
projects than are comparable agencies in other 
countries; although a number of other countries 
have vv^ell-established basic research efforts in 
mapping and sequencing human and nonhuman 
genomes— efforts that could either complement 
or compete with U.S. efforts. 

Gene mapping is perhaps the most common in- 
wernational research activity in human genetics, 
and it is likely to be an area to which many na- 
tions will contribute. Human genes are highly 
polymorphic, and populations from different 
regions exhibit considerable genetic variation. 
These regional differences will allow researchers 
to contribute to comparative studies, as well as 
to characterize and map genes of particular 
regional interest (e.g., the thalassemias in the 
Mediterranean and Oudtshoorn skin disease in 
the Afrikaner population in South Africa). The 
study of DNA from diverse peoples wiU shed light 
on the nature of polymorphisms and genetic dis- 
orders, even if it does not lead immediately to im- 
proved health care (8). 

The large scope of genome projects invite^ in- 
ternational cooperation. Informal cooperation and 
collaboration are already underway through a va- 

*]t is not within the scope of this assessment to provide a detailed 
analysis of biotechnologicai capabilities and industrial funding, suf • 
fice it to say that genome research is part of a much larger arena 
of Federal « university, and industrial research and development 
A forthcoiring OTA assessment. iVeivDeve/opme/ifs in Biotechnol- 
ogy, 4: US. Investment in Biotechnology (81 )> coveiS this m great 
detail for the United States A previous OTA assessment. Comme"' 
dal Biotechnology: .An International Analysis (79), describes the state 
of biotechnology in Western Europe and Japan, the more recent 
Department of Commerce reports, Biotechnology in Western £ur* 
ope and Biotechnology in J^pan (39)* offer updated information 
on international efforts. 



riety of mechanisms. Formal collaboration could 
speed research and reduce the financial burden 
cn each country. Maintenance of international 
databases and repositories is particularly im- 
portant to provide timely access to informa- 
tion from research conducted around the 
world. Many scientists encourage international 
cooperation in genome research, but any effort 
to conduct genome mapping and sequencing proj- 
ects on an international scale must be based on 
a realistic assessment of the capabilities and in- 
terests of the countries involved. 

Countries that do not themselves carry out the 
kinds of research involved in mapping and se- 
quencing can play an important role by collect- 
ing genetic material from families for compara- 
tive studies. One such project, a collection of 
genetic material from a group of Venezuelan fam- 
ilies, was a key factor in the successful search for 
the gene that causes Huntington 's disease (see box 
7-A). Similar pedigree collections are being estab- 
lished and maintained in Egypt and Denmark, as 
well as in isolated populations in the United States 
such as Mormon and Amish communities. These 
pedigrees provide valuable source material for the 
study of polymorphisms and genetic disease in 
human beings. 

This cha^ er summarizes the state of DNA map- 
ping and sequencing research in Japan, Western 
Europe, and elsewhere. Issues of international co- 
operation and competition and precedents for in- 
ternational cooperation in science are examined. 
Some organizational options for the international 
management of genome projects are proposed, 
specifying areas in which cooperation might best 
be achieved and describing cooperative frame- 
works already in existence. Chapter 8 outlines 
questions about international technology trans- 
fer that might emerge in collaborative or cooper- 
ative situations. 



Box 7-A.— The Venezuelan Pedigree Project 



In the small fishing villages that line the coast of 
Lake Maracaibo in Venezu^ lives an unusual group 
of families. If you walk into any of these villages, 
you may be met by residents who do a characteris- 
tic dance down the streets— large, jerky motions, 
staggering and weaving from side to side. For many 



years the residents of these villages were ostracized, 
considered to be chronically drunk. But in the early 
1970s a doctor from a nearby military base real- 
ized that the dance was not due to alcoholism but 
to Huntington's disease, a rare, dominant genetic 
disease that causes degeneration of nerve cells in 
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the brain. The onset of Huntington's disease is gen- 
erally late: In those who carry the gene, symptoms 
begin at age 35 or older. The disease leads to loss 
of control of the voluntary muscles, first causing 
twitches and jerks, then dementia, and finally death. 

A preliminary study describing the case histories 
and pedigrees of approximately 100 patients from 
Lake Maracaibo families was presented at a meet- 
ing of the American Neurological Association in 
1972. It was an interesting case: an interrelated set 
of families, along whose pedigree could be traced 
an extraordinarily high incidence of a genetic dis- 
ease that is rare in the general population. At the 
time, however, no one knew what to do about it. 
The case remained an interesting anecdote in the 
memories of the researchers who attended the 
meeting. 

One of those researchers was Nancy Wexler, a 
clinical psychologist. Wexler had both a professional 
and a personal interest in Huntington's— her mother 
had died of the disease, so she and her sister each 
have a 50:50 chance of developing it. 

Wexler and her colleagues remembered the case 
of the Venezuelan families 5 years later, when writ- 
ing a report on Huntington's disease for a congres- 
sional commission. One of their recommendations 
was to initiate a genetic study of the Venezuelan 
pedigree. Starting in 1979, the National Ip->titutes 
of Health appointed v^ exler to dirfect a progi am that 
would implement the recommendations and set 
aside funding for the Venezuelan genetic study. The 
first team of researchers went to Lake Maracaibo 
in 1981 to collect blood samples from which to ex- 
tract DNA. At the same time, they compiled a care- 
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Huntlrii^ljn's patient being rowed across Lake 
Maracaibo. 



ful record of the pedigrees of the volunteers fronri 
whom the blood was extracted. Rasearch teams 
have gone every year since. The pedigree has grown 
to include over 7,000 family members; the diagram 
of it occupies a 100-foot -long section of a corridor 
near Wexler's Columbia University office. DNA sam- 
ples have been collected from nearly 1,500 family 
members, some with Huntington's and some 
without. 

At the same time the genetic study got underway, 
advances in recombinant DNA technology, specifi- 
cally the elaboration of techniques for finding 
genetic markers using RFLPs (see ch. 2), increased 
the power of analytical methods that could be used 
on the collected family materials. In 1982, Jim 
Gusella and others began to screen the DNA from 
the Venezuelan collection for genetic markers 
linked to the gene for Huntington's disease. They 
tested DNA from normal and affected members of 
the ^/enezuelan families, comparing the different 
pattPi-ns cut by restriction enzymes on the samples 
from different family members. The fact that the 
pedigree included large extended families was use- 
ful in locating informative markers. By 1983, the 
researchers had figured out which chromosome 
contains the Huntington's gene and had identified 
a linked marker, paving the way for a diagnostic 
test— an extraordinary breakthrough, says Wexler, 
in such a short time. The search for the actual 
gene is not yet over, however, since locating more 
closely linked markers has presented unforeseen 
difficulties. 

A cure is not in sight for the families of Lake 
Maracaibo, but they have made an extremely im- 
portant contribution to the study of Huntington's. 
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Nancy Wexler going over Huntington's disease 
pedigrees. 



Moreover; their genetic materials are a valuable re- 
source for other genetic studies, including searches 
for other disease genes, as well as for the develop- 
ment of genetic maps. Indeed, some of the DNA has 
been contributed to the international mapping col- 
laboration coordinated by CEPH (see box 7-B). Wex- 
ler suggests that "the pedigree is a big genetic 
playground~whate\'er idea you have, you could 
probably test it there." 

The Venezuelan pedigree project highlights an im- 
portant role that developing countries can play in 
human genome projects, even if they do not yet have 
the capability to carry out human genome research 
on their own. A similar collection of genetic mate- 
rials from patients with genetic diseases (primarily 
the anemias and thalassemias) and their families was 
started in Egypt in 1964 and has proceeded since 
in collaboration with scientists from NIH and sev- 
eral universities— Oxford, London, Harvard, Colum- 
bia, New York University, and the University of Cali- 



fornia, San Francisco. The scientists who manage 
this collection are eager to cooperate in international 
efforts to map and sequence the human genome. 
As Wexler points out: 

In many cases, the countries are eager to col- 
laborate, but they dont know what they hav 
to offer. The patient populations are a valuable 
resource . And once the working relationships are 
established between Third World countries with 
health problems and the high-tech labs in the de- 
veloped countries, the connections are there for 
advice and assistance if those countries get to the 
point of starting their own labs. 
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JAPAN 



Japan's efforts to develop automated DNA se- 
quencing technologies have been highly publicized 
over the past year, causing concern that Japan 
will capture the market for sequencing technol- 
ogy and that it will realize most of the potential 
profits from genome projects. Japan does not, 
however, have well-defined government policies 
for human genome mapping. Instead, funding for 
mapping and sequencing research is under the 
jurisdiction of half a-dozen government agencies 
that often compete for prestige rather than at- 
tempt to coordinate efforts. 

Mappini* and Sequencing Research 

The general framework for science policy in Ja- 
pan is formulated by a small group of bureaucrats 
in the various agencies and by an inner cabinet 
group, the Council on Science and Technology, 
chairecl by the prime minister Programs for hu- 
man genome research have been divided among 
the Ministry of Education, Science, and Culture 
(MESC), the Science and Technology Agency (STA), 
and the Ministry of International Trade and Indus- 
try (Mm). The Ministry of Agriculture, Forestry, 
and Fisheries supports some research on nonhu- 
man genomes, notably a $500,000 feasibility study 
on sequencing the entire genome of rice (77). 
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The Ministry of Education; Science; 
and Culture 

Most mapping and sequencing research falls un- 
der the domain of MESC, the primary supporter 
of basic research in Japan. Like the National In- 
stitutes of Health in the United States, MESC sup- 
ports research projects selected by peer review; 
it provides grants and funds for universities and 
university -based researchers and for several na- 
tional research institutes. In addition, the minis- 
try can encourage research in specific, targeted 
areas on the recommendation of its advisory 
committees. 

The ministry does not yet have an official pol- 
icy regarding genome research but has appointed 
an advisory committee to study the situation. 
Members of the committee visited the United 
States in early 1988 to gather information on U.S . 
policies on human genome research and to ascer- 
tain what the U.S. expects of Japan. The commit- 
tee's recommendations will be implemented be- 
ginning in fiscal year 1989 or 1990 (58). 

Japan is often criticized for not doing enough 
basic research; many observers have questioned 
whether Japanese scientists have enough exper- 
tise in basic molecular biology to support a major 
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gene mapping or sequencing effort [Yoshikawa, 
see app. A]. Biblicmetric analysis [see app. E] indi- 
cates that while Japan's research output in DNA 
mapping is far below that of the United States (fig- 
ure 7-1, table 7-1), its proportion of research rela- 
tive to other countries has consistently increased 
over the last decade. Its share of publications on 
human gene mapping and sequencing rose from 
2 percent in 1977 to 5 percent in 19S6, compared 
to a U.S. share that varied from 40 to 46 percent 
during those years. In addition, MESC supported 
the research cf a scientist that led to the publica- 
tion of a complete genetic map of E. coli in the 
prestigious magazine Cell in 1987 (53). U.S. re- 
searchers published a map of E. coli at about the 
same time, but the Japanese research was nota- 
ble for the speed with which it was done and for 
the use of automated technologies. 

The Science and Technology Agency 

STA supports mostly mission-oriented basic re- 
search. It has played a leading role in the devel- 
opment of automated sequencing technology. 
Since 1981, STA's Special Coordination Fund for 
the Promotion of Science and Technology has un- 
derwritten a program entitled Extraction, Analy- 
sis, and Synthesis of DNA, with a total funding 
of $3.8 million (40). The project, led by Akiyoshi 
Wada of the University of Tokyo, aims to "to re- 
duce the burden of time demanded of research- 
ers working on the analysis of DNA base sequences 
by developing automatic machinery," utilizing the 
knowledge and resources of companies with ex- 
pertise in electronics, robotics, computers, and 
material science [Wada quoted in Yoshikawa, app. 
A]. The project scientists are adapting robotic tech- 
niques and mass production machines to automate 
the time-consuming steps in the Maxam and Gil- 
bert sequencing process (see ch. 2) rather than 
developing new processes. The project has re- 
sulted in a prototype of a microchemical robot, 
mado by Seiko, but it is not yet on ihe market. 
The goal of the project has been to increase the 
rate of DNA sequencing output in general, not to 
sequence the entire human genome. Wada has 
repeatedly emphasized the necessity for interna- 
tional cooperation in the project and would like 
to develop a supersequencing center to operate 
as a service facility for scientific groups around 
the world (84,87). 
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STA and a private foundation sponsored an in- 
ternational conference in Okayama in July 1987 
to discuss the state of DNA sequencing technol- 
ogies and possible strategies for genome sequenc- 
ing in the future; the conference gave no clear 
indication of the pace or direction of future STA 
efforts. Some scientists expressed doubts about 
the STA project, noting that there has been no 
public discussion in Japan about whether or not 
to support Wada's conception of the project and 
that the project is not actively supported by many 
other Japanese scientists [Yoshikawa, see app. A]. 
Still, a quiet consensus has emerged that sequenc- 
ing technology should be developed regardless of 
whether a full-scale project to sequence the hu- 
man genome is launched. 

Oversight of the project has now shifted from 
the Special Coordination Fund to STA's CouncO 
for Aeronautics, Electronics, and Other Advanced 
Technologies (CAEOAT); a decision on the status 
of future directions of the sequencing research 
should be made by spring 1988. The publicit" and 
momentum of the project are undoubtedly attrib- 
utable in part to the active role that ex -Prime Min- 
ister Nakasone played in advocating biotechnol- 
ogy and related projects [Yo«ihikawa, see app. A]. 
Whether the momentum will continue now, af- 
ter Nakasone's retirement, remains to be seen. 



The Ministry of International 
Trade and Industry and the 
Human Frontiers Science Program 

MITI coordinates applied research, linking 
university researchers with industry to encourage 
technology development and commercialization. 
It does not now play a major role in genome re- 
search, but its influence may increase if the Hu- 
man Frontiers Science Program is fully funded. 
A human genome sequencing project may become 
a focal point for the program. 

The Human Frontiers Science Program (HFSP) 
is a proposal for an international, cooperative pro- 
gram of research in basic biology and the devel- 
opment of related "key technologies." The pro- 
posal originated in 1985 in MITI's Agency of 
Industrial Science and Technology (AIST). The pro- 
posal came about partly in response to interna- 
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tional criticism that Japan does little basic research 
itself, but capitalizes on the research of others 
(4;23), and partly to emphasize international co- 
operation in the face of persistent foreign trade 
frictions [Yoshikawa, see app. A]. The HFSP pro- 
posal met with a lukewarm reception dm ing early 
outings and international conferences, however, 
and Nakasone's mention of it at the June 1987 Eco- 
nomic Summit meeting roused little enthusiasm 
lYoshikawa; see app. A] {46,76). 

If implemented; HFSP would probably enhance 
Japan's sequencing effort; since DNA sequencing 
technology has been identified as a key area for 
development. The program was granted an ini- 
tial budget of 197 million yen (approximately $1.5 
million) for fiscal year 1987, to conduct a feasibil* 
ity study, but the amount to be spent on develop- 
ment of sequencing technologies is not yet clear. 
Some observers speculate that the proposal will 
be shelved now that Nakasone has retired. MITI 
officials contend; however, that the program is 
still viable (4;90). A December 1987 planning meet- 
ing again endorsed human genome sequencing 
as a focus for HFSP; but the Ministry of Finance 
probably will not decide on the program's bud- 
get until 1989 (75). 

Commercialization of Mapping and 
Sequencing Technologies 

Potentially marketable technologies that are de- 
veloped for genome projects have been supported 
by the several mechanisms through wl ich the gov- 
ernment aids industrial research in technology 
development. STA's Special Coordination Fund; 
established in 1981; provides incentives for basic 
research for new technologies in accordance with 
the long-term goals for science and technology 
development set by its Policy Committee. STA's 
Research Development Corp. promotes commer- 
cial uses of government-developed technologies 
that might not be used otherwise. The prototype 
of Seiko's microchemical machine was developed 
with assistance from the Special Coordination 
Fund; while the Research Development Corp. has 
supported its commercialization. In addition^ 
Hitachi; Fuji Photo Film, Toyo Soda; and Mitsui 
Knowledge Industries have all undertaken re- 
search into the automation of DNA sequencing , 
and some relevant products are being commer- 



cialized. DNA extractors developed by Toyo Soda 
are already on the market, as is a gel preparation 
by Fuji and autoradiograph readers by Seiko and 
Hitachi. 

Potentials for Cooperation and 
Conflict With the United States 

Many Japanese scientists are willing to cooper- 
ate in an international genome sequencing project, 
but collaboration will clearly be accompanied by 
economic tensions and competitive posturing both 
by the United States and by Japan. 

The development of similar automated technol- 
ogies by U.S. and Japanese companies may pose 
difficult trade issues. The Japanese concentration 
on sequencing hardware has drawn criticism from 
American companies ; which fear that the Japa- 
nese could take the lead in developing technol- 
ogies for the analysis of DNA (89). At present; how- 
ever; U.S. manufacturers are clearly ahead in the 
development and manufacture of equipment for 
manipulating and analyzing DNA (see ch. 2). Jap- 
anese companies are not as far along in market- 
ing relevant products as is often reported— while 
the Seiko machine has been touted in the West- 
ern press, few scientists in Japan have even heard 
of it (40). In addition; the machine's economy has 
been overrated: One frequently quoted estimate 
for the sequencing systems is $0. 1 7 per base pair; 
with a target of $0.01 or lesS; but Wada himself 
states that the system is still far from reaching 
even the $0.17 goa' (86). (The present cost of se- 
quencing is approximately $1.00 per base pair.) 
Finally; despite the customary preference of Jap- 
anese officials for buying Japanese machines; offi- 
cials of U.S. -based Applied BiosystemS; Inc. (ABI; 
Foster City; CA) in Japan have reported no diffi- 
culty in marketing their DNA sequencing machines 
and other instruments used by molecular biolo- 
gists [Voshikawa; see app. A]. To date; Japan is the 
largest market for ABI's sequencing machine (47). 

One frequently voiced fear is that Japanese com- 
panies are focusing on automating parts of the 
sequencing process that companies in the United 
States have not yet automated (although several 
U.S. firms have begun development). Thus far, 
however, the STA-sponsored technology develop- 
ment effort is based on automating machines that 



ERiC 



141 



139 



use conventional methodology rather than devel- 
oping or using new molecular biology techniques. 
Scientists at some U.S. companies have commented 
that it may have been a mistake for Japan to in- 
vest so much in automating existing methodolo- 
gies when there are new technologies emerging 
that may make the old methods obsolete. 

Databases; which are generally considered use- 
ful and politically straightforward areas for co- 
operation on genome projects, present knotty 
problems of ownership of information. Despite 
support within the scientific community, the de- 
velopment of shared databases— even within 
Japan— is problematic. The Japanese Government 
has recognized that Japanese databases and re- 
positories are insufficient to handle even its own 
research and development, and it is trying to estab- 
lish the database infrastructure necessary for a 



sequencing effort. It appears, however, that the 
effort is not well coordinated: Nearly every one 
of the government agencies is setting up a DNA 
or protein sequence database fo. own purposes, 
with a minimum of interaction. The DNA Data 
Bank of Japan (DDBJ), initially established in 
MESC's National Institute of Genetics in 1984 as 
a counterpart to GenBank® in the United States 
and the database operated by the European Mo- 
lecular Biology Laboratory (EMBL), has lacked ade- 
quate staff and computing power. Until recently, 
it operated only as an access node to GenBank® 
and EMBL. It has stepped up its operations, how- 
ever, and is now gathering and entering data from 
Japanese researchers and transmitting it to the 
other databases (see app. D). DDBJ formally joined 
the GenBank®/EMBL collaboration in May 1987; 
the Japanese data were released in the most re- 
cent updates of GenBank® and EMBL. 



EUROPE 



While Japan is often vieu^ed as a prime com- 
petitor, many European countries have stronger 
research traditions in molecular genetics and the 
development of related technologies. There are 
notable genome mapping and sequencing activi- 
ties in France, Italy, and the United Kingdom, and 
significant research in gene mapping and tech- 
nology development in Denmark, the Federal 
Republic of Germany, and others. In addition, sev- 
eral supranational organizations in Europe have 
developed targeted programs to encourage bio- 
technology development; human genome projects 
can be and are being included. The following sec- 
tions describe research activities underway in the 
European community as a whole and in selected 
countries, in alphabetical order.^ 

The information presentad in the sections on selected countries 
is based on several sources The OTA contracted a report on re- 
search efforts in key countries in Western Europe (Newmark. see 
app. A]. Some information was gleaned from scientific journals and 
international news sources In late 1986 and throughout 1987, OTA 
conducted an informal survey of international efforts, contacting 
embassy officials, science attaches, and scientists from numerous 
countries to request information about the types and funding levels 
of genome mapping and sequencmg research undertaken in those 
countries and asking whether any specific policies governed genome 
research The infonnation gathered from this effort vaned consider- 
ably in focus, depth* and detail. The countries represented here- 
other than those with targeted or particularly well known research 
programs— are thus self 'selected and self-reported. The recult is a 
descriptive account rather ♦iian a comprehensive analysis 



European Organizations 

Over the past two decades, many European na- 
tions have lupported scientific collaboration in 
principle, but in practice funding has been a per- 
sistent problem: 

Most European governments have become in- 
creasingly reluctant to invest large sums of pub- 
lie money in domestic and civilian R&D, and this 
is reflected at the European level As domes- 
tic science budgets in Europe have become hard- 
pressed for cash, governments are asking whether 
they are getting value for money from interna- 
tional projects. Scientists in some fields have also 
come to view such projects as unwelcome com- 
petitors for their domestic research budgets (29). 

Nonetheless, several existing organizations in Eur- 
ope either support genome research now or c^uld 
do so in the future. 

The European Economic Community 

The founding treaties that established the in- 
stitutions of the European Economic Community 
(EEC) made little explicit provision for research 
and development beyond that needed for Eura- 
tom (which dealt with nuclear energy, including 
radiation biology), the Coal and Steel Community, 
and some coordination of agricultural research 
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under the Treaty of Rome founding the EEC. In 
January 1974, the Council of Ministers a^greed on 
the general need for an EEC research and devel- 
opment policy, and in the mid-1970S| the EEC's 
advisory commission began proposing programs, 
including a program of research and training in 
selected areas of genetics and enzymology (bio- 
molecular engineering). It was not until 1981 that 
this proposal was approved, since Article 235 of 
the Treaty of Rome specifies that such programs 
can only be adopted by unanimous agreement of 
all member states (11). 

Support of research and technological develop- 
ment has been enhanced by the adoption of the 
Single European Act, which took effect on July 1, 
1987. This act modifies and extends the Treaty 
of Rome by adding provisions for precompetitive 
research to strengthen "the scientific and tech- 
nology basic of European industry and to encou- 
rage it to become more competitive at the inter- 
national level" (19). Once a multiyear framework 
program is unanimously agreed on by member 
states, the individual research and development 
programs within its agreed areas and financial 
limits can be approved by a qualified majority 
(member state votes are weighted roughly by size). 
The current framework program, an initiative to 
help create collaboration in targeted areas in sci- 
ence and technology, was adopted on September 
28, 1987, and runs until 1991, with a global limit 
of 5.396 mUIion ECUs (European cur-^ency units, 
which in recent years have had approximately the 
same value as the U.S. dollar). Framework pro* 
grams must be proposed by the commission and 
approved by the governing Council of Ministers 
and the European Parliament (11). 

Most relevant to genome research is a series 
of research programs in biotechnology: the Bio- 
molecular Engineering Programme, 1981-85; the 
Biotechnology Action Programme (BAP), 1985-89; 
and Biotechnology Research and Innovation for 
Development and Growth in Europe (BRIDGE), 
1990-93. A Concertation Unit for Biotechnology 
in Europe was established in 1984 to coordinate 
the various activities in biotechnology [Newmark, 
see app. A]. These programs have been designed 
to complement national research programs while 
promoting the development of European biotech- 
nology (83). 
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The budget for BAP has been substantially re- 
duced from the original proposal; as of spring 
1987, it appeared that approximately $300 mil- 
lion of the proposed $6 billion budget would be 
earmarked for biotechnology research, with 
another $100 million for health, including some 
funds for human genome mapping and sequenc- 
ing work, under the heading of "predictive medi- 
cine" [Newmark, see app. A]. "Within the biotech- 
nology program (s), active consideration is being 
given to mapping and sequencing technology, and 
in particular with respect to the genome of yeast," 
although "given the range of topics within the cur- 
rent biotechnology program, it would be surpris- 
ing if genome work gained more than a small frac- 
tion of the total" (11). However, "Community 
research expenditures have a catalytic role that 
mobilizes other funds, and a political significance 
that enhances the coherence and consequent ef- 
fectiveness with which national funds are de- 
ployed" (11). BAP encourages proposals that in- 
clude at leas t one industrial partner in the research 
effort or that provide specific evidence of inter- 
est on the part of industry. 

When BAP expires, it will be replaced by 
BRIDGE, which is likely to place even more em- 
phasis on industrial participation. While not yet 
finalized, BRIDGE is 'ikely to include a project to 
sequence the genome of yeast, which is more fea- 
sible than sequencing the human genome [New- 
mark, see app. A]. The tentative plan is to under- 
take a 2-year pilot project in which perhaps 15 
laboratories will concentrate on sequencing one 
yeast chromosome; eventually, a large number of 
European yeast laboratories would be involved. 
The pilot project might be launched under BAP, 
but the full project would be part of BRIDGE and 
is provisionally estimated to cost $50 million. The 
project would also try to create a market for se- 
quencing equipment [Newmark, see app. Al. Re- 
search on the project will begin soon at some par- 
ticipating laboratories in the United Kingdom 
[Mount, see app. A]. 

A subprogram of BAP, Contextual Measures for 
R&D in Biotechnology, aims to enhance EEC ca- 
pabilities in bio-informatics (the use of computers 
and information science in biology), data capture 
techniques (including advanced instrumentation 
and automated reading), data banks, computer 
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modeling; computer software, and the ''collection 
of biotic materials" (repositories); along with the 
''development of information and communication 
techniques for enhancing the quality and useful- 
ness of such collections" and "the development 
of techniques for the identification ; characteri- 
zation; conservation; and resuscitation of the ma- 
terials held in such collections" (20). Development 
of a biotechnology infrastructure has obvious po- 
tential for researchers in human genetics. 

Another EEC activity that aids genome research 
is the Task Force for Biotechnology Information . 
Created in 1982; the task force has produced dis- 
cussion papers and has provided small sums of 
money; totaling $200;000; to support databases 
(including a contribution to software development 
at the database of nucleotide sequences run by 
the EMBL; discussed below and in app. D); and 
the launching of the European branch of the 
CODATA Hybridoma Databank; centered at the 
American Type Culture Collection in Rockville; 
Maryland. The task force work plan for 1987-90 
maintains support for databases; conmiunications; 
and computational research. The commission of 
the EEC also supported a series of workshops and 
studies (1984-86) investigating the interface be- 
tween biotechnology and information technology 
in a planning exercise known as Bioinformatics: 
Cbllaborative European Programs and Strategy 
(BICEPS); which "aims to formulate a mid- to long- 
term strategy for Europe in bio- and medical in- 
formatics" and "overall; to improve the European 
competitive position in the rapidly developing 
world market for these technologies and applica- 
tions" (18). Documents for BICEPS refer to the in- 
formatics requirements of human genome se- 
quencing and have contributed to plans for 
bio-informatics in BAP and BRIDGE and to a pro- 
posal for a program of Advanced Informatics in 
Medicine (17). The proposed pilot phase; 1988-90, 
at 25 million ECUS; was presented by the com- 
mission to the European Parliament and the Coun 
cil of Ministers in September 1987. It includes 
plans for the development of advanced sequenc- 
ing instruments and related computational facil- 
ities required in genome and other areas of bio 
chemical and protein engineering research. The 
European chemical industry trade association has 
endorsed some of the BICEPS proposals and has 



indicated a willingness to help support an infra- 
structure such as sequence databases (ll;2l). 

Apart from biotechnology programs; EEC funds 
research and development in health. The com- 
mission's original proposals for the framework 
program envisaged a Program of Predictive Medi- 
cine and Novel Therapy; which would seek "de- 
velopment of predictive medicine and novel ther- 
apy oriented towards better knowledge of the 
human genome; and genetic engineering proc- 
esses aiming at the repair of DNA defects (e.g.; 
in congenital diseases of genetic origin)" (11). The 
program was designed to support research in four 
areas from 1987 through 1991: study of the hu- 
man genome (including mapping the genome as 
an aid in the diagnosis and prevention of genetic 
disease); nucleic acid probeS; genetic therapy; and 
monoclonal antibodies. Funding for the program, 
originally proposed at $75 million; has been re- 
vised downwards to $25 million; both budget and 
content may be further revised before the pro- 
gram is approved. 

The European Molecular Biology 
Organization 

Funded by 17 European countries; EMBO serves 
primarily to strengthen the training of European 
molecular biologists. It supports fellowships; work- 
shops and training courses; occasional scientific 
meetings; and a journal; bul it does not directly 
support research. EMBO sponsored a meeting of 
Europeans with an interest in human genome re- 
search in spring 1987. Few of the scientists present 
expressed an interest in mounting a major Euro- 
pean mapping or sequencing project; instead; most 
favored informal cooperation between individual 
laboratories. The group was pessimistic about 
whether public funds could be found for a large- 
scale project and raised the possibility of seeking 
private funds [Newmark; see app. A]. 

The European Molecular 
Biology Laboratory 

Located in Heidelberg; West Germany; EMBL 
is financed by contributions from 10 of the 17 
member nations of EMBO. It houses the adminis- 
trative offices of EMBO; but the oi^ganizations have 
separate budgets and purposes. EMBL's staff of 
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about 250 scientists and technicians; drawn from 
member nations and from West Germany, work 
on a scientific program proposed by its director- 
general; at present Lennart Philipson; and sub- 
ject to the approval of a council composed of rep- 
resentatives from contributing countries. The lab- 
oratory was founded with the notion that 
molecular biology would require facilities that 
would be too expensive for any national research 
program to support. For the most part; however, 
research in molecular biology has not required 
large centralized facilities; and member nations 
have tended to interact less with EMBL as they 
have become proficient at molecular biology in 
their own laboratories (28). Consequently; mem- 
bers have often been grudging in their support; 
which limits the projects that EMBL can under- 
take. EMBL's annual budget is approximately 45 
million deutschmarks (about $26.5 million); 25 to 
30 percent of which is paid by West Germany (68). 

EMBL sponsors research in instrumentation; 
biocomputing; and gene mapping and sequenc- 
ing as well as other areas of biology. EMBL's re- 
searchers have been active in technology devel- 
opment for mapping and sequencing and have 
produced prototypes of machines for automat- 
ing some of the steps in DNA sequencing (see ch. 2). 

EMBL also operates the major European data- 
base of nucleotide sequences; which works in co- 
operation with (SenBank® to gather and dissemi- 
nate sequence data. For EMBL to undertal;e a 
major human genome project would require a con- 
siderable increase in budget— unlikely under cur- 
rent circumstances— and sustained enthusiasm 
from its members (Newmark; see app. A]. Director- 
General Philipson is eager to promote collabora- 
tion on a genome sequencing project, which he 
believes will increase the need for a centralized 
European data -handling facility. In the 1986 di- 
rector's report; Philipson encouraged the estab- 
lishment of new support programs for a human 
genome project: 

If the American plan to launch a programme 
on the human genome materializes, the EMBL 
may be a natural collaborative partner in this 
project. It might, therefore; be worthwhile to plan 
for at least one new Programme in one of those 
fields to be initiated in Heidelberg at the end of 
the proposed Scientific Programme (1990). To fa- 



cilitate recruitment and the launching of this Pro- 
gramme, plans should be available by 1990 but we 
do not foresee any cost during the next 4 years (36). 

The European Science Foundation 

Headquartered in Strasbourg; France; the ESF 
is subscribed to by 49 research councils and 
equivalent bodies from 18 Euroj)ean countries (33). 
It supports projects on a special funding basis from 
a small central fund; in the past; the ESF has not 
sponsored much research in biology; although re- 
cently it has supported some protein engineer- 
ing work. One of the foundation's standing com- 
mittees; the European Medical Research Council; 
enables the heads of national medical research 
bodies to meet once a year. The council has no 
budget; however; and little influence outside the 
ESF. At its 1987 meeting; the council decided not 
to attempt to coordinate European research on 
human genome mapping and sequencing [New- 
mark, see app. A]. 

The European Research Cioordination 
Agency 

A French-initiated response to the U.S. Strate- 
gic Defense Initiative; EUREKA was set up in 1985 
to encourage development of advanced technol- 
ogies in Western Europe. Participating in EUREKA 
are the 18 democracies of Western Europe: the 
12 member states of the EEC (Belgium; Denmark; 
France; the Federal Republic of Germany; Greece; 
Ireland; Italy; Luxembourg; The Netherlands; Por- 
tugal; Spain; and the United Kingdom); the 5 mem- 
ber states of the European Free Trade Associa- 
tion (Austria; Finland; Norway; Sweden; and 
Switzerland); and Iceland. 

EUREKA promotes industry-led technological 
collaboration among its members in several areaS; 
including biotechnology and advanced informa- 
tion technology. It supplements EEC's efforts by 
funding research beyond the precompetitive 
stage. A EUREKA project must involve at least two 
industrial laboratories in two different European 
countries. Governments vary in their financial sup- 
port of EUREKA projects: Some offer little more 
than token support and assistance in administer- 
ing an international collaboration; others, such 
as France; pay up to 50 percent of a EUREKA 
project . Coordinated by a small secretariat in Brus- 
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sels, EUREKA'S performance has impressed many 
observers. Still, maintaining consistent funding is 
difficult, since most of the governments support- 
ing EUREKA have not created procedures for 
funding the program (34). There are no EUREKA 
projects for human gencxne mapping and sequenc- 
ing yet, but the program might be used to link 
French researchers to industrial partners in Eur- 
ope, particularly in the development of sequenc- 
ing technologies (Hewmark, see app. A]. 

National Research Etforts in Europe 
Denmark 

The National Health Authority, the primary 
funding agency for biomxiical research, supports 
some gene mapping studies, although there is at 
present no centralized effort. Other funds for gene 
mapping and sequencing come from general al- 
lotments to universities and research institutes, 
from the government, and from research coun- 
cils, notably the Danish Research Council. Spe- 
cial projects can be funded by applying to the 
appropriate research council. The Institute of 
Medical Genetics of the University of Copenha- 
gen is the most prominent Danish effort in the 
field. It has the longest tradition and the greatest 
interest in gene mapping; sequencing is not yet 
a major concern, although it may be in the fu- 
ture. A University of Copenhagen scientist is the 
editor of the international journal Clinical Genetics, 
which publishes mapping studies and similar re- 
search. There are several ongoing projects at the 
institute on various genetic diseases, but there is 
no concerted effort or government policy on map- 
ping and sequencing (70). 

One project of interest is a family pedigree 
project that has been underway for more than 
10 years. Like the Venezuelan pedigree project 
(box 7-A), this is a coUection of genetic material 
from families with many children; the coUection 
contains "samples of red ceUs, serum, plasma, 
thrombocytes ^larts of the blood that help in clot- 
ting), lymphocytes [cells important in the immune 
systeml, as weU as skin biopsies'' (59). Unlike the 
Venezuelan material, the genetic material in the 
Danish project was coUected from apparently nor- 
mal families; over the years it has been tested by 
classical genetic markers to help establish poly- 
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morphic regions for genes of different blood 
groups, enzyme types, and so on. Extensive RFLP 
mapping (see ch. 2) of the material has not been 
done because of limited resources, but negotia- 
tions are underway to contribute material to the 
Center for the Study of Human Polymorphism 
(CEFH), an international gene mapping center lo- 
cated in Paris, for further mapping. There is as 
yet no clear policy in Denmark on whether to se- 
quence large porttons of the family material, espe- 
cially because resources are limited, but the re- 
search group is exploring the possibility of 
collaborative arrapgmients within Denmark, with 
other coimtries, and with the United States. The 
goal is to establish a Danish center for human g^ 
mapping, LINK, starting with the family material 
that has already been ^thered and expanding the 
collection, as well as drawing in researchers fipom 
other institutes. LINK is envisioned as a Scandina- 
vian counterpart to the French CEPH effort (59). 

The Danish Government has established 10 new 
biotechnologicai centers and allocated D.kr. 500 
million (about $80 million) for their operating ex- 
penses over the next 5 year^ i6,59); 410 million 
will be used to establish new research centers at 
technical universities and private firms (24;. fhe 
biggest center, at Aarhus, is already supporting 
some gene mapping research in collaboratkin with 
CEPH. 

Federal Republic of Germany 

The emergence of the environmentally oriented 
Green party in West Germany, combined with a 
general wariness about research with possible eu- 
genic applications, has made molecular genetics 
research a sensitive political issue/ Nonetheless, 
research in molecular biology is well funded by 
federal, state, and private monies. There are four 



"One indication of thii attitude is that a federally appointed com- 
miasion of government and ouuide experts on genetic engineering 
recommended, in early 1987, that there be "tight limits drawn for 
analyses of human herediury factors (genomic analysis) as well as 
for gene therapy" (2). The commission published an extensWe re- 
port entitled Cfumces and RUk$ of Genetic Engineering after two 
years of study. An EngUsh translation of the foreword and recom- 
mendations of the report , entitled Gene Technology: Opportunities 
Mnd Ri$k8 (16) has been made available by the EEC. The DFG criti- 
cized the recommendations of th^ commission in the case of ge- 
nome analysis, arguing that the search fbr causes and cures for 
genetic defects is a scientific duty and servet public intereet 
(42). 
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main sources of funds for basic research in 
molecular genetics. Tlie Max Planck Society; which 
receives a substantial allotment from the federal 
government but is legally independent, supports 
the Max Planck Institutes; each of which is devoted 
to a particular area of research (72). The German 
Research Association (DFG) obtains approximately 
half of its funds ft^m the federal government and 
half from state governments and supports re- 
search in the universities. The German Ministry 
for Research and Technology (BMFT) supports 
projects in universities as well as funding the In- 
stitute for Biotechnology Research and other re- 
search institutes. Individual states contribute to 
some science research through the universities. 
Another source of potential support for genome 
research is the prestigious Society for Biotechno- 
logical Research (GBF), a government-funded re- 
search center (78). 

At present; West Germany does not have a co- 
ordinated genome mapping or sequencing project. 
At a meeting in September 1987; representatives 
of the DFG decided not to endorse a concerted 
genome project; although the agency does sup- 
port a research program targeting molecular 
methodology for studying the genome (52). 

West Germans are strong supporters of inter- 
national cooperation. They consistenUy contrib- 
ute to EMBL; and several laboratories are carry- 
ing out research that could be extended at little 
expense and aligned with an international collabo- 
ration in genome research. 

Biotechnology is being actively promoted by the 
federal and state governments in West Germany. 
The Federal Ministry of Researcii and Technol- 
ogy's Biotechnology Research Program; initiated 
in 1985; includes as an objective the promotion 
of "research and development projects in public 
Ufe care^ including health; nutrition; and environ- 
mental protection''; one of its high-priority re- 
search ^as is a program of "genetic engineer- 
ing with a focus on the investigation of gene 
structures ; research on gene functions; and on 
controlling of genetic processes" (68). The minis- 
try has also encouraged the establishment of re- 
search centers in which university and industry 
would participate and has set up seven "gene 
centers" to stuidy areas including gene expression 



and differentiation and the correlation between 
gene structure and function. Human genome map- 
ping and sequencing are not explicitiy included 
in either the Biotechnology Research Program or 
the genetic research centers; but both support 
related research and could provide an institutional 
infrastructure and funding framework for ge- 
nome research. 

Finland 

In January 1987; scientists at the Finnish Acad- 
emy proposed a 5-year plan to improve biotech- 
nology and molecular biology research; in order 
to promote industry and increase industrial ca- 
pabilities. The proposal included a request for the 
equivalent of $37 million per year for research; 
training; and equipment (48). Finland has estab- 
lished several genetic engineering research cen- 
ters and has plans for half-a-dozen more; the in- 
stitute associated with the University of Helsinki 
is perhaps the best known. 

Human genome mapping in Finland is being 
done by about 10 large and small individual re- 
search groups in medicine and science. They are 
primarily funded by government sources; namely; 
university budgets and the Academy of Finland; 
which is the main funding source other than 
universities. The University of Helsinki hosted the 
eighth international Human Gene Mapping Work- 
shop (HGM 8) (S). Finland has no concerted effort 
nor any specific policies; as in most countries; how- 
even sequencing efforts have focused on particu- 
lar genes. Finnish groups are involved in collabora- 
tive projects with groups in other countries; 
notably the United StateS; and have contributed 
to and received materials from international data- 
bases and repositories. 

France 

Since 1981; the French Government has sought 
to make France a world power in science and tech- 
nology by increasing both funding and political 
interest in research and development. The Gov- 
ernment has encouraged collaboration between 
university and industry researchers; both within 
the country and with the rest of Europe (e.g.; the 
EUREKA program). 
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The French Ministry of Research is directly or 
indirectly in charge of nearly all government- 
funded research. Most is carried out within 
universities, often in units set up by the research 
organizations; the largest of which is the National 
Center of Scientific Research (CNRS). The CNRS 
and the much smaller National Institute of Health 
and Medical Research ONSERM) are the only two 
government organizations that support research 
related to human genome mapping and sequenc- 
ing. The Pasteur Institute in PariS/ a semi- 
autonomous institute that receives half its funds 
from the government; carries out related research. 
None of these organizations has announced a firm 
plan for human genome mapping or sequencing; 
but each is considering what part it might play 
(Newmark/ see app. aT 

An important focus of genome studies in France 
is the CEPH (see box 7-B). Organized in 1983 by 
Jean Dausset to "hasten the mapping of the hu- 
man genome by linkage analysis with DNA poly- 
morphisms;" CEPH is a privately funded center 
that coUects and distributes genetic materials for 
use in mapping studies. It acts as an informal coor- 
dinator for appro^rdmately 40 investigators in Eur- 
ope; North America; and Africa who use CEPH 
materials in exchange for reporting their data 
(25,26). 

France has not initiated a coordinated genome 
proiect; but there is a strong undercurrent of opin- 
ion favoring a substantial program in human ge- 
nome mapping and sequencing as long as it is not 
funded at the expense of other research. Genome 
researchers may try to work through EUREKA 
to involve other European companies with an in- 
terest in instrumentation or infoimation technol- 
ogy. The French Government (usually through its 
Ministry of Industry) is prepared to provide 50 
percent funding for EUREKA projects; and there 
are indications that it would consider CEPH's hu- 
man genome work eligible for EUREKA funding 
(Newmark; see app. A]. 

Italy 

Recent administrations have given priority to 
improving Italy's scie- iic performance in hopes 
of sparking a technology -led revitalization of the 
country's ailing economy. Considerable extra 
funds for technology -related research have been 
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made available in the past few years ; with bio- 
technology as one focus. The Italian Government 
announced in April 1987 that it would aUocate 
209 billion lire (approximately $156 million) over 
a 5-year period for a national biotechnology 
project involving both public research centers and 
industry (64); the following month Italy's National 
Research Council (CNR) announced a special re- 
search nroject in biotechnology for which it will 
spend 84 billion lire (about $63 million) over the 
5-year period (51). 

In May 1987; the CNR announced its decision 
to initiate a project devoted to human genome se- 
quencing; to be run as a cooperative effort of all 
CNR institutes and laboratories working in biol- 
ogy (22). Nobel laureate Renato Dulbecco is co- 
ordinating the project; in which CNR has started 
investing 20 billion lire (about $15 million) and 
75 to 100 person-years (51). A 2-year pilot project 
with a budget of $1 million per year vdll be under- 
taken first; to determine whether a large^cale 
project will be funded at around $10 million a year. 
fThese sums are to cover only specific materials; 
machines; travel; meetings; and so on— not sala- 
ries and general overhead— since only the exist- 
ing number of personnel will be involved.) 

A key question in the pilot project is whether 
it is possible to isolate a single chromosome with- 
out damaging it so much that sequencing would 
be impossible. The ability to separate the chro- 
mosomes would offer a shortcut to sequencing/ 
and researchers could begin sequencing with one 
of the smaller chromosomes (but one vdth genes 
of particular interest); probably chromosome 21; 
22; or Y (73). Othenvise; researchers will consider 
continuing the project using conventional tech- 
niques. Research institutes and laboratories in 
Rome; NapleS; Pavid; and Milan will participate 
in the project. Databases and information retrieval 
will be managed by research units in Rome, Turin; 
Milan; and Bari; with the aim of making the na- 
tional databases compatible with and complemen- 
tary to existing international ones (57;73). 

The pilot human genome project is still explora- 
tory; so no attempt is being made yet to coordi- 
nate work with researchers outside Italy. Project 
scientists anticipate that the final project would 
be complementary tO; jf not an integral part of, 
any international project that arises [Newmark, 
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see app. A]. In the meantime; Italian scientists are 
enthusiastic about Italy's role in genome mappini^, 
"there is good reason to believe that; for O'zce, 
this country will perhaps succeed in reaching the 
starting line ahead of other countries" (73). Ital- 



ian scientists are not the only ones interested in 
chromosome 21, however; it is a popular target 
for research because it contains genes for Alz- 
heimer's disease and for I3own's syndrome, and 
it is likely to be an early focus of U.S. efforts. 



Box 7-B.— The Center for the Study of Human Polymorphism (CEPH): 
An International Gene Mapping Center 

The Centre dEtude du Polymorphisme Humain (CEPH) has become an important focus of international 
scientific cooperation in the drive to map the human genome. CEPH is a private research foundation estab- 
lished in 1983 by French Nobel laureate Jean Dausset with the bequest of an anonymous donor. Its aim 
is to "hasten the mapping of the human genome by linkage analysis with DNA polymorphisms." 

The basic premise behind CEPH's activities is that a genetic linkage map (see ch. 2) will be more easily 
constructed if researchers study genetic material from a common group of families— a reference panel. 
The most useful family pedigrees consist of four living grandparents with many children and grandchildren 
so that the inheritance of DNA can be traced through three generations. CEPH maintains DNA from a panel 
of 40 families; each with 5 to 15 children; in most cases, all grandparents are living. The DNA from 29 
of the 40 families in the CEPH collection was contributed by Ray White and his collaborators from the 
Howard Hughes Medical Institute (HHMI) in Utah. Dausset also solicited family materials gathered by other 
researchers in the United States and Europe, including some material from normal families identified in 
the Venezuelan pedigree project. In contrast to that project (see box 7-A), in which researchers collected 
material from families with Huntington's disease in order to trace the gene responsible, CEPH maintains 
material from families with no known genetic diseases. The markers mapped to chromosomal locations 
in normal CEPH families can then be used to accelerate the search for disease genes in other families. 

CEPH coordinates an international collaboration of researchers from laboratories in Europe, North 
America, and Africa. In order to obtain material from CEPH, collaborating investigators must first possess 
DNA probes that detect genetic markers, generaUy RFLPs. They must agree to use the probes to test the 
entire panel of 40 families and to provide CEPH with al! of their data. There are no enforcement mecha- 
nisms, but so far researchers have cooperated. 

Dausset's work is supplemented by the efforts of Jean-Marc Lalouel, a mathematical geneticist at HHMI 
in Utah who has designed a variety of computer programs to record and analyze the data contributed 
by CEPH investigators. Lalouel and his collaborators have written programs that analyze genetic linkages 
and automatiraiiy sketch out gene maps from the results. These programs are sent out on disk with the 
CEPH DNA samples. Researchers can record and analyze their data using the programs on the disk, then 
send the disk back to CEPH for inclusion in a central database. HHMI supports a database station at CEPH 
that will be ^uiked to its Utah station and may soon include interactions with other databases as well. 

An important factor in CEPH's success at fostering cooperative research is the two-tiered database it 
maintains. One database, available only to collaborators, contains all data that investigators produce. At 
the end of a year's time or ^ vnen the results have been published, whichever comes first, data from the 
collaborative database is moved into a public database, where it is accessible to any qualified i esearcher. 
This system of having both a private and a public database ensures the timely sharing of information while 
affording investigators some proprietary protection for their results. The fact that the collaboration re- 
quires sharing of data--but not the actual probes, which could prove to be patentable— reduces potential 
competitive tensions. 

lOUllCU: 
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Industry is not playing a role in the pilot project; 
since few Italian companies have the technologi- 
cal interest or capability. But scientists involved 
in the research believe that ''the automation re- 
quired for the project will act as a major incen- 
tive for industiy " and hope that industry would 
help finance the final project (73). At least one Ital- 
ian pharmaceutical company has expressed a will- 
ingness to participate and contribute. 

The United Kingdom 

The United Kingdom has a strong research tra- 
dition in molecular biology and genetics, and it 
has done pioneering work in the mapping of non- 
human genomes and in the development of se- 
quencing techniques. The United Kingdom has 
consistently ranked second to the United States 
in the number of articles on human gene map- 
ping and sequencing published annually in inter- 
national journals (see figure 7-1, table 7-1). The 
United Kingdom also ranks high in the develop- 
ment of physical mapping techniques and of auto- 
mated technologies for DNA manipulation and 
analysis. Thus the United Kingdom is well placed 
inteUectually; if not financially, to contribute sig- 
nificantly to mapping the human genome. 

Basic biomedical research is funded mostly by 
the government through the Department of Edu- 
cation and Science, although both the Department 
of Health and Social Security and the Department 
of lie. and Industry have funds available for 
contract i esearch . The Department of Education 
and Science distributes research monies through 
universities and through five research councils. 
The research councils provide support for scien- 
tific programs carried out in universities; some 
^ councils also support research within their own 
institutes. Biotechnology is an area of overlap for 
the Science and Engineering Research (Council 
(SLRC) and the Medical Research Council (MRC), 
the two councils whose areas of interest are most 
closely related to human genome research. The 
science and engineering council supports basic 
biological research outside the medical field, al- 
though it has supported some work on automated 
DNA sequencing through a biotechnology direc- 
torate established to link academic research to 
industrial needs. The MRC is undoubtedly the 
leading supporter of mapping and sequencing re^ 
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search. Its total expenditure for genome-related 
research for the 1985-1986 fiscal year, both di- 
rect and indirect, was approximately £42 mil- 
lion ($7.4 million) [Newmark, see app. A] (88). 

The MRC is similar to the NIH in supporting high- 
quality, investigator-initiated proposals, although 
the council also establishes targeted programs in 
particular areas . It has a bngstanding commitment 
to molecular biology and has the power to set up 
new units devoted to particular areas of research 
when a suitable director and sufficient funds are 
available . Although the MRC supports a good deal 
of relevant research and its various units and grant 
holders have the expertise and instrumentation 
necessary for the study of genetk; disease^ the MRC 
does not now plan a targeted program of research 
on human genome mapping or sequencing. At a 
1987meeting, however, the MRC did endr'^se the 
plan of an employee^ well-known scientist Syd- 
ney Brenner, to map the human genome (largely 
with private funds) as long as the research pro- 
ceeded at no extra cost to the research unit Bren- 
ner directs (66). At Brenner's request, the MRC 
has also agreed to set up a committee that will 
consider questions such as who owns the clones 
produced in mapping efforts and how best to pro- 
vide public access to them.^ 

Brenner's project will be financed in part by a 
:£300,000 (about $525,000) prize award he re- 
ceived from the Louis Jeantet Foundation; the 
MRC and other sources will provide another 
:£200,000 to j£250,000 (about $350,000 to 
$440,000) per year (56). The project will build on 
a mapping techninue developed by Alan Coulson, 
John Sulston, and co-workers in the MRC research 
unit at Cambridge. They compiled a genetic link- 
age map of the nematode Caenorhabditis elegana 



•"It has been agreed [by the MRC) that the human genome work 
should constitute a separate project to be carried out as an exten- 
sion of the work of the [Molecular Genetics] Unit [in Cambridgel 
It was also considered tlut the tonger term future of this work couki 
not be tied to the finite tenure of a personal Unit . The project might 
evoh^e into a reference laboratory with a major service component 
and wouki then need a different funding structure. A central aim 
would be to ensure that the collection of cbnes and information 
remained in the public domain. It was therefore agreed that an Advi- 
sory Board be established to conskler these and oth'^r policy mat* 
ters" (66) 
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genome; the smaDest genome known for any mul- 
ticellular creature (it is estimatbd to be 80 million 
base pairS; compared with approximately 3 bil- 
lion base pairs for the human genome— see ch. 
2). Brenner expects that perhaps half of the ge- 
nome could be mapped by a few people within 
5 years. The project will include research on data- 
handling methods and parallel processors, since 
the mapping techniques require sophisticated 
computing capabilities. 

The Imperial Cancer Research Fund QCRF), a 
charitable organization financed solely by dona- 
tions, has recently recruited scientists to work on 
the development of a different technique for hu- 
man genome mapping, as well as related software 
and instrumentation [Newmark, see app. A]. The 
MRC and ICRF plan to explore the possibility of 
collaboration in areas of common interest. 



Other efforts in the United Kingdom include 
technology development in automated systems for 
genome sequencing at the University of Man- 
chester Institute of Science and Technology 
(UMIST) (1) and biocomputing research at the 
University of Edinburgh. The Edinburgh Biocom- 
puting Research Unit has considerable experience 
in database searching and related problems and 
is undertaking a variety of studies into the infor- 
matics needed for analysis of map and sequence 
data (15). 

The United Kingdom contributes to interna- 
tional research efforts such as EMBL, to which 
the MRC pro'^idcu £2.72 million (about $^.7 mil- 
lion) in 1987. i be MRC maintains a level contri- 
bution to EMBL in real terms, after supporting 
some growth of the organization in 1982, when 
the new director was appointed (66) 



OTHER INTERNATIONAL EFFORTS 



Australia 

The largest research institution in Australia is 
the Conmionwealth Scientific and Industrial Re- 
search Organization (CSIRO), which is conduct- 
ing pertinent research through its Division of 
Molecular Biology. Biomedical research is primar- 
ily the province of the National Health and Medi- 
cal Research Council^ which at present funds a 
number of researchers working on gene mapping 
and sequencing. The Department of Human Ge- 
netics and the Medical Molecular Biology Unit at 
the Australian National University in Canberra are 
sites of some relevant research activity. In partic- 
ular, chromosomes 6 and 9 are the foci of investi- 
gation because several genes have been localized 
to them (43,82). Researchers at the Cytogenetics 
Unit Department of the Adelaide Children's Hos- 
pital in North Adelaide are constructing maps of 
chromosome 16 and part of the X chromosome. 
They have collaborated with scientists from the 
U.S. Department of Enei^'s Lawrence Livermora 
and Los Alamos National Laboratories. 

The Department of Industry, Technology and 
Commerce administers a system of research 
grants under its National Biotechnology Program, 
with priority areas including genetic engineering 



and cell manipulation and culture, which could 
provide support for genome research. 

danada 

Canada does not yet have a national policy on 
genome sequencing. The National Research Coun- 
cil (NRC) is considering the creation of a task force 
to address this subject within its laboratories. A 
national network of biotechnology laboratories 
supported by the councU has been set up, includ- 
ing the Biotechnology Research Institute in Mon- 
treal, the Plant Biotechnology Research Institute 
in Saskatoon, and the Division of Biological Sci- 
ences in Ottawa, which focuses on protein engi- 
neering. 

In addition to the expertise that the government 
research institutes mi^t lend to genome research, 
Canada has 15 to 25 university laboratories with 
the necessary skills and equipment to participate 
in a human genome project. To date, however, 
there has been little effort to coordinate the activ- 
ities of these various groups. Canadian scientists 
and government officials are paying close atten- 
tion to international developments in human ge- 
nome sequencing and are hopeful that opportu- 
nities for international collaboration will develop 
(67). 
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Latin America 

Relatively few laboratories are involved in hu- 
man genome research; of those that are, the pri- 
mary interest is generally mapping genes for dis- 
eases of particular national significance. As one 
observer pointed out; "Brazil has its share of good 
scientists; but they are hampered by lack of fund- 
ing and difficulties importing equipment and ma- 
terials"; presumably the same holds true in other 
Latin countries (13). 

Many Latin American countries realize the com- 
mercial potential of biotechnology; Brazil and Ar- 
gentina; among otherS; have initiated programs 
to encourage biotechnology research and devel- 
opment. Argentina has a biotechnology program 
^e aegis of its Secretariat of Science and 
i'jchnology t'^O); and Brazil has a Biotechnology 
Secretariat in Ministry of Science and Tech- 
nology (13). See *ered throughout Latin America 
are individu**' laboratories doing relevant re- 
search. 

In MexicO; "scientists are pushing the Mexican 
government to consider the development of ge- 
netic research a priority. They dont want to fall 
behind on this kind of research; because the 
pathology index in the Mexican population is ap- 
proaching that of developed countries. With epi- 
demics and infections decreasing; greater atten- 
tion can be paid to genetic problems" (69). Like 
Brazil; however; Mexico has a low research bud- 

t Qess than 0.6 percent of the gross national 
product is spent on research) and can neither af- 
ford sophisticated equipment nor train enough 
scientists; both countries are interested in int*^"- 
national cooperation . The Organization of Amer- 
ican States reports Uiat its Department of Scien- 
tific and Technological AffairS; which runs a 
Regional Program for the Development of Science 
and Technology in La*m America and the Carib- 
bean; includes projects in plant and animal 
gv ietics but none in human genetics (65). 



Soutii Africa 

Gene mapping and sequencing research is sup- 
ported by the Medical Research Council (MRC); 
the (Council for Scientific and Industrial Research 
(CSIR); and the National Cancer Association. None 
has initiated a formal or coordinated attempt to 
map or sequence the human genome, but there 
are a number of laboratories at work in the field 
of human genetics (30). Several researchers are 
active in the CEPH collaboration; screening the 
CEPH family materials and contributing their re- 
sults. Researchers are examining genes for Hun- 
tington's disease; cystic fibrosis; and neuro- 
fibromatosis in collaboration with laboratories in 
the United States and the United Kingdom (10). 
Research is also underway on several genes of 
particular interest in the region— those for Oudt- 
shoorn skin disease and familial hypercholester- 
olemia (conditions prevalent in Afrikaners) and 
albi m, which is common in the Bantu popula- 
tion (50). 

The Union of Soviet Socialist 
Republics and Eastern Europe 

Although the Soviet Union has not been a ma- 
jor contributor to mapping and sequencing studies 
published in international journals; it has pub- 
lished some research on bacterial genomes (74) 
and the barley genome (3). Soviet scientists are 
also working on computational methods for ana- 
lyzing DNA sequences (7). The Central Institute 
for Molecular Biology in East Berlin has under- 
taken a variety of studies in gene mapping and 
sequencing and has collaborated with research- 
e^'^ in the United Kingdom (45). Bibliometric anal- 
yses (see figure 7-1 and app. C) show that the So- 
viet Union and Eastern European countries have 
not published a significant number of research 
articles on human gene mapping and sequencing. 
These figures tend to select items from interna- 
tional journals; lioweven so internal publications 
are not as thoroughly catalogued and accounted 
for. 



ERLC 



150 



INTERNATIONAL COLLABORATION AND COOPERATION 



The large size and humane mission of human 
genome projects make them ideal candidates for 
international collaboration. International data- 
bases have already been established and are be- 
ing jointly maintained; which indicates some will* 
ingness to cooperate on gene mapping efforts, but 
it remains to be seen how far that cooperation 
will extend . The potential for commercial payoffs 
raises difficult questions but does not preclude 
successful collaboration as long as prior agree- 
ment on allocation of benefits is reached (32;49). 
The ^oUowing sections recount some precedents 
for collaboration and cooperation in international 
science projects and the role the United States has 
played in them. Organizational options available 
for international himian genome projects are ex- 
aminedi and some collaborative efforts already 
underway are described. The foUowing chapter 
outlines the questions of international technology 
transfer that wiU undoubtedly arise in any coordi- 
nated international effort. 

Precedents for International 
Scientific Programs 

The biological sciences have been organized into 
international projects far less often than other sci- 
ences; but collaborations in the physical and space 
sciences can provide useful organizational insights . 
The International Geophysical Year, box 7-C, is 
an example. 

Since the 19408, research in particle and high- 
energy physics has relied on complex and expen- 
sive equipment— notably, the particle accelera- 
tor—that is beyond the ability of any individual 
investigator, or even any one institution, to con- 
struct and maintain. Consequently, a number of 
large, specialized laboratories have emerged na- 
tionally and internationally. In the United States, 
centralized facilities evolved into a network of na- 
tional laboratories, now operated by DOE. These 
laboratories house cyclotrons, synchrotrons, and 
other advanced instruments and undertake re- 
search in a broad range of areas, cooperating in 
limited ways with researchers from abroad. 

The European Center for Nuclear Research 
(CERN) was established in 1954 to advance knowl- 



edge in the field of particle physics. It is operated 
by 14 European nations and has provided a frame- 
work for collaboration in instrumentation. Its 
governing council consists of one technical advi- 
sor and one administrative advisor from each 
member nation. Participants contribute to CERN 
based on their gross national products, although 
no nation can contribute more than one-quarter 
of CERN's annual operating budget. CERN has en- 
abled European nations to conduct research be- 
yond the capabilities of any single member na- 
tion and has been widely recognized for its success 
in the advancement of particle physics. It has re- 
stricted its efforts to basic research, however, and 
so has avoided the complications that arise in col- 
laborative work on applied research (80). 

The enormity of the endeavor to explore and 
study space spawned proportionately large agen- 
cies to manage the research. The founding legis- 
lation of the United States' National Aeronautics 
and Space Administration (NASA) included inter- 
national cooperation as a major theme, and NASA 
has carried out that mandate by negotiating and 
implementing hundreds of cooperative projects. 
Some NASA projects have established formal joint 
working groups on a bilateral basis with other 
national agencies. These groups meet several times 
a year to "discuss present and future projects of 
mutual interest, and to exchange information on 
scientific and management issues of concern" (61). 

One of NASA's major partners has been the 
European Space Agency ^SA), a collaboration of 
13 European nations. The Hubble Space Telescope 
is an example of collaboration between the two 
agencies. In 1977, officials from NASA and ESA 
drew up an agreement to work together on the 
project, citing specific contributions and respon- 
sibilities (37). An article on data rights directed 
that scientific data from the telescope be reserved 
for analysis for one year, then turned over to pub- 
lic data centers. Results were to be made avaU- 
able to the scientific community through publica- 
tion as soon as possible and appropriate. No 
specific provisions were made for patenting prod- 
ucts or processes developed in the course of the 
project. 
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Box 7-C.— The International Geophysical Year 

The International Geophysical Year QGY) was originally conceived as the third in a series of interna- 
tional polar years— earlier cooperative investigations into the phenomena of the Arctic and Antarctic took 
place in 1882-1883 and 1932-1933— but the scope was expanded to include the study of all aspects of the 
physical environment. Sydney Chapman, one of the organizers, described the enormous undertaking as 
it finally evolved: 

The main aim is to learn more about the fluid envelope of our planet— the atmosphere and oceans— over all 
the earth and at all heights and depths. The atmosphere, especially at its upper levels, is much affected by distur- 
bances on the Sun; hence this also will be observed more closely and oxitinuously than hitherto. Weather, the 
Ionosphere, the earth's magnetism, the polar lights, cosmic rays, glaciers all over the world, the size and form of 
the earth, natural and man-made radioactivity in the air and the seas, earthquake waves in remote places, will 
be amc j the subjects studied. These researches demand widespread simultaneous observation. 

To accomplish this, teams of scientists from 67 nations— 60,000 in all^bserved, measured, and recorded 
data in meteorology; geomagnetism; auroras and airglow; the ionosphere; solar activity; cosmic rayS; 
oceanography; gladology, gravity measurements; and other disciplines over a period of 18 months in 
1957 and 1958. 

The effort was coordinated by the Special Committee of the IGY (CSAGI) under the auspices of the 
International Council of Scientific Unions. Planning committees were appointed to organize research 
programs in 14 different disciplines. Participating nations grinerally had their ov.^n planning commissions 
or advisory boards as weU. 

An essential feature of IGY was the operation of world data centers. Participants agreed to send 
all of their data to three major centers; in the United StateS; the U.S.S.R.; and Western Europe. Organiza- 
tions or investigators from any country could obtain copies of the deposited materials free of charge 
(other than the price of reproduction and transmission). In addition; the data were summarized and 
presented in more than 30 volumes in the Annals of the Internatimal Geophysical Year, an information 
resource that provided the raw material for subsequent research in geology, meteorology; oceanogra- 
phy; and other fields. 

SOURCES. 

S Chtpman, AnttMb of the IntmmtlonMl Ceophy$iaU Y&ar, forward quoted in H Newell. Beyond the Atimmphere Esrly Yetn of SpMce Science (Washin|{ton. DC NASA. 1900) 
S ChaiNiian, K Y Ymr of Diaoovmy (Ann Arbor. Ml Univenfty of Michl^ PreH, 1959) 

JM England, A Patron for Pure Sdenoe The National Sdenoe Foundation^ Formative Yean (Washington. DC National Science Foundation. 1982), pp 297-304 

NeweU, Beyond the Atmoaphere Bar^ Yeara of Space Science (Washington. DC NASA. 1380) 
W Sullivan. Aaaauh on the Unknown' The International Geophyaical Year (New York. NY McGraw-Hill. 1961) 



NASA's operating principles for international 
coUaboration are a useful starting point for draw- 
ing up collaborative agreements/ One key dif- 
ference; however; between human genome proj- 
ects and most space research is the commercial 



■NASA has never formAUy encoded its mechanisms for interna* 
tk3nal coUaboration, but it has developed an informal set of guidebnes- 

• Cooperation is on a pro|ect-by^roject basis> not on a program 
or other open-ended agreement. 

• Each project must be of mutual interest and have clear scien* 
tific value. 

• Technical agreement is necessary before political commitment. 

• Each side bears fuU financial responsibility for its share of the 
project. 

• Each side must have the technical and managerial capabilities 
to carry out its share of the project; NASA does not provide 
substantial technical assistance to its partners, and little or no 
U.S. technology is transferred. 

• Scientific resuhs are made public (55). 
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potential: "Astronomical data have no commer- 
cial value" (71). The gap between research in 
molecular genetics and the market has narrowed 
rapidly in recent yearS; making the boundary be- 
tween basic and applied or development -oriented 
researc ^ nearly impossible to draw. Consequently; 
agreements similar to those negotiated by NASA 
and ESA regarding data rights and publication of 
results could prove insufficient for human genome 
projects. A second difference is that the instrumen- 
tation required for human genome projects is nei- 
ther as large nor as expensive as that used in par- 
ticle physics and space research. 

In spite of a stated desire for international co- 
operation; the United States has generally acted 
as the primary parttier in large science projects; 
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defining them and then inviting other nations to 
join in, rather than planning, funding; and imple- 
menting projects jointly (54). In the present era 
of constrained funding, however, the United States 
may not always be able to carry out major re- 
search projects on its own. 

Collaborative projects can offer significant sav- 
ings for participating countries by splitting the 
financial burden (although some observers have 
pointed out that the costs of negotiating and the 
loss of jobs if a project is located outside the United 
States may reduce the savings). Collaboration cre- 
ates a paradox, however: On the one hand, it might 
reduce the cost for each member, making the proj- 
ect more feasible; on the other, it might reduce 
each nation's potential economic gain from the 



project. The world economic situation has led to 
an increasing desire for scientific research to pro- 
duce commercially valuable products, thereby 
fostering a protective, nationalistic attitude toward 
research (see box 7-D). 

Options for International 
Organization of Genome Research 

A decision to pursue human genome projects 
on the international level, emphasizing coopera- 
tion and participation, will entail considerable or- 
ganizational effort. It will have the same organiza- 
tional goals as a domestic effort: to eliminate 
redundancy in research and to expedite the spread 
of scientific and commercial knowledge of the ge- 



Box 7-D.— Views on International Cooperation and Collaboration in Genome Research 

'Too many promising international research collaborations, from AIDS research to the sequencing of 

the human genome, languish for lack of a workable framework for tangible and short-term research 

The U.S. Department of Energy and the Japanese Science and Technology Agency have an interest in orga- 
nizing and supporting the (genome] project; each seems sensibly to have decided that two independent 
projects would be a waste of resources and a source of confusion, but [they] differ sufficiently in their 
objectives as to impede agreement between themselves, let alone with others." Editorial in Nature 328:187, 
1987. 

'There's a task to be done here, and we need to get on with the task. If we try to take into account 
every country's interest and concerns, we can only serve to delay it." J. McConnell, Johnson &. Johnson, 
Science Writers' Workshop, Brookhaven National Laboratories, Upton, NY, Sept. 14, 1987. 

"An international DNA analysis center or centers equipped with super sequencing systems which are 
connected to a worldwide data -network should be developed." A. Wada, "Many Small-Scale or a Few Large- 
Scale DNA Sequencers?" unpublished report, Japan, 1987. 

Tt is highly desirable that the U.S. continue to be the leader of the [genome mapping and sequencing) 
effort, but it must be consciously and effectively run as an international quest for knowledge having univer- 
sal importance. No single purse nor administrative center, in either the U.S. or the world, can or should 
be created to fund or attempt to direct the task." D. Fredrickson, National Institutes of Health, personal 
communication, December 1987. 

"There is ... a growing awareness in Europe that the first megaproject in biology is shortly being 
launched. Europe ought to participate in it alongside the USA and Japan to ensure access to the information 
and all that it implies for medical and biological science, as well as the technological spinoffs that will surely 

arise There is now an opportunity to ensure that the project involves international collaboration from 

its outset which should not be missed." L. Philipson and J. Tooze, "The Human Genome Project," JB/o/wfur, 
June 1987, pp. 94-101. 

"If they wished, either Western Europe or Japan could by themselves take on this project and it must 
be assumed that they will initiate their own efforts. So a new international body should soon be formed 
to ensure that collaboration, not competition, marks the relationship between these efforts in various parts 
of the world. In a real sense, the exact sequence of the human genome will be a resource that should 
belong to all mankind. So it is a perfect project for us to pool our talents, as opposed to increasing still 
further the competitive tensions between the major nations of the world." J.D. Watson, director's report 
for Cold Spring Harbor Laboratories, in press. 
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'The principle of 'mutual self-interest' ... lies at the heart of successful cooperation." D. Dickson and 
C. Norman; ''Science and Mutual Self Interest/' Science 237:1102, 1987. 

"If a sequencing factory can be built, Wada emphasizes that it would not be Japan Incorporated' against 
the rest of the world. He wants an international centre that would be open to scientists of all nationalities 
and intended for the benefit of all mankind." D. Swinbanks, "Human Genome: No Consensus on Sequence/' 
Nature 322:397, 1986. 

This project is so vast that it necessarily requires international cooperation. Since there are 3 billion 
bases to be sequenced, the project will not create problems of competition." P. Vezzoni, Consiglio Nazionale 
di Richerche, Milan, quoted in A. Sommariva, "And Italy Will Study Chromosome 22, ' Italia Ood (Milan), 
May 22, 1987, p. 36. 

'There's considerable interest in the commercial spinoffs, and I expect each country would want to 
keep those. I would hate to see U.S. tax dollars used to kill yet another U.S. industry," J. McConnell, Science 
Writers' Workshop, 1987. 

"On the one hand, the climate for international collaboration in science ... is wanner than ever. In 
virtually every major field, U.S. scientists can point to significant work being done in Europe, the Soviet 
Union, Japan, Canada, or Israel that needs to be read closely, argued about, and replicated as much as 
does work done in the United States. On the other hand, the new era is chillier, for governments and 
businesses here and abroad will continue to try to squeeze economic value out of every bit of science to 
win the international high-tech sweepstakes." D. Shapley and R. Roy, Lost at the Frontier: U£, Science 
and Technology Policy Adrift (Philadelphia, PA: ISI Press, 1985), p. 116, 

The creation of a sequence database is the major goal of the project, whether it is done nationally 
or internationally or privately, . , , I doni think an international project as an organized scheme will emerge. 
... I expect a set of private ones will emerge, with some level of cooperation." W, Gilbert, Harvard Univer- 
sity, Science Writers' Workshop, 3rookhaven National Laboratories, Upton, NY, Sept. 15, 1987, 

'1 am convinced that an international advisory body must be formed to oversee the data bases, , , . 
International cooperation {is] as important as interagency coordination in the U.S.A. But I do not think 
that a special institution would be useful at the national and at the international level." A. Lafontaine, Office 
of the Secretary General, Brussels, Belgium, personal communication, June 1987. 

There is a strong belief here that practical collaborations on actual, well-defined projects are very 
helpful, and probably more meaningful than large-scale collaboration between governments. Cell banks, 
gene banks, and databases are very important in this regard," A, de la Chapelle, University of Helsinki, 
personal communication, August 1987, 

"International cooperation is not something that should be imposed by government agencies Real 

cooperation comes from individual scientists communicating with each other," C, DeLisi, Mount Sinai School 
of Medicine, Science Writers' Workshop, Brookhaven National Laboratories, Upton, NY, Sept, 15, 1987. 

"I'm just concerned that if we focus on trying to set up an international effort, we will delay decisions 
of the United States in proceeding with this. I'd like to see a willingness to cooperate at the international 
level, but setting U.S. national priorities," G. Cahill, Howard Hughes Medical Institute, comments at Issues 
of Collaboration for Human Genome Projects, OTA workshop, June 26, 1987. 

'TTJhe United States does not and cannot expect to monopolize information and innovation in this field. 
Moreover, the initiation of a human genome project in the United States will probably not deter work 
in other countries, but rather will stimulate it. Given this assumption, the importance of past traditions, 
and the magnitude of the task of mapping and sequencing the entire human genome, every effort should 
be made to enhance the existing contacts between the United States laboratories and those overseas, so 
as to speed the work. Indeed, we believe it will become necessary to have some major organized mechanism 
for international cooperation. In particular, its objective would be to collate data and ensure rapid accessi- 
bility to it, as well as to distribute materials, such as cloned DNA fragments." Nationd Research Council, 
Mapping and Sequencing the Human Genome (Washington, DC: National Academy Press, 1988), p, 85, 
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nome. Just as the issues in domestic organization 
revolve around distribution of authority and tasks 
among interested government agencies and pri- 
vate firms (described in ch. 5), the issues in inter- 
national organization involve coordination of in- 
terested sovereign nations. 

At international organization could be either 
passive or active. A passive organization would 
Berve primarily as a clearinghouse of research in- 
formation among participating nations. This task 
%vould require the formulation and oversight of 
standard nomenclature and the translation of re- 
search reports. The organization would need to 
keep track of research in progress and any tech- 
nological innovations reported by individual lab- 
oratorieS; and it might be intimately associated 
with databases such as GenBank^ and the EMBL 
data bank and with collaborative organizations 
such as CEPH. Although participation in this type 
of passive organization would have to be volun- 
tary^ all academic researchers would stand to ben- 
efit from the free flow of mformation. The pro- 
prietary interests of commercial researchers 
might limit their participation, but collaborative 
arrangements could be made (12;49). The success 
of a passive international oi^anization depends 
primarily on the good will of the participants 

An active international organization along the 
lines of the interagency task force described in 
chapter 6 could plan and distribute genome re- 
search among participating countries. There are 
at least three ways in which the taslcs of an in- 
ternational genome project may be distrib- 
uted: 1) by physical units, such as chromosomes 
or genes ; in which each country would analyze 
one unit or a group of units; 3) by project aspect, 
such as sequencing; informatics, or cloning, in 
\.'hich each country would focus on one aspect; 
and 3) by geography^ in which each country or 
group of countries with similar resources would 
establish a genome center. 

Distribution by physical units would require 
each participating nation to possess the entire 
spectrum of technical specialties associated with 
the projects-mapping, sequencing, data manage- 
ment, and so on. This requirement would prob- 
ably limit involvement to those nations that are 
already scientifically advanced, regardless of any 
Interest among nations attempting to develop bio- 

ERIC 



technical capabilities. The requirement could, 
however, spur developing nations to acquire tech- 
nologies, and it might provide an economic incen- 
tive for commercial firms to assist in the start-up 
efforts. Assignment by chromosome would most 
likely cause intense politicking among the top na- 
tions for the most "interesting" chromosomes. Cer- 
tain countries or regions might be more interested 
in chromosomes known to contain genes that af- 
fect a large portion of their {lopulations. Such a 
method of assignment would also identify a spe- 
cific natk>n with a specific achievement, effectively 
pbcing flags on the map of the genome. The reali- 
zation of this would inject an element of competi- 
tion for national prestige into the context of an 
international science project. In effect, the coop- 
erative partners would be establishing the arenas 
and ground rules for competition. 

An international project divided by project 
aspect would require participating countries to 
adopt a specialty, which would accelerate devel- 
opment and commercial profit in that field but 
could preclude achievement in related fields. Ja- 
pan, for example, might contribute a large share 
of DNA sequencing because of its interest in auto- 
mating sequencing technologies. Tne component 
tasks of a genome project are not equivalent nor 
easUy evaluated in terms of necessary resources, 
so distributing them may prove difficult. Further, 
some aspects of the project are more visible and 
economically valuable than others. To map or se- 
quence an important gene is noteworthy and prof- 
itable; to create a database is to provide a com- 
mon good but to receive little of value in return. 
An international division of labor is an attractive 
idea, but only clearly defined special talents among 
the nations would justify it. 

The third possible distribution of international 
efforts is geographical— several genome centers 
could be established and supported by a nation 
or group of nations. The vocation of these centers 
might become a point of debate, however: Should 
each cover the ftill spectrum of genome technol- 
ogy, or should they specialize?® If each center at- 
tempted to cover all technologies, a division of 



The idea of setting up large centers has been promoted by Amer* 
ican scientiat and entrepreneur Walter Gilbert (44,^ and by Japan's 
Akiyoshi Wada iM,SS,B7). Both have referred specifically to sequenc- 
ing rather than to genome research in general. 
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labor might evolve based on specialized innova- 
tion. This might keep the centers complementary 
and competitive; but not necessarily cooperative. 
Establishing specialized centers would predeter- 
mine each center's scientific and economic suc- 
cess. Focusing aU of them on a single aspect; for 
example sequencing; would siphon funds and at- 
tention from the other aspects . A center arrange- 
ment involving only countries with state-of-the- 
art research capabilities might lock out interested 
countries just beginning to develop biotechnol- 
ogy capabilities; unless the centers were amena- 
ble to taking on minor partners. Few scientists 
other than the two who have proposed the se- 
quencing center idea seem to be enthusiastic about 
the prospect of establishing large centralized in- 
stitutions (see box 7-D). 

If an international project is to be pursued; is- 
sues of participation and underlying motivations 
should be recognized clearly and early. Without 
specific guidelines for initial and future partici- 
pation, any organization is likely to become en- 
trenched and inaccessible to latecomers. If the 
motivation for an international distribution of ef- 
fort is purely economic, then participation might 
be restricted to nations already able to demon- 
strate their ability to contribute. Should an inter- 
national effort be tied to political goals such as 
assisting the growth of biological research and 
biotechnology in the devetoping world; then wide- 
spread participation and an organization capable 
of coordinating both advanced and developing 
countries would be necessary. If political motives 
are acknowledged; then the international orga- 
nizatk)n might seek to encourage the association 
of national goals and priorities with genome re- 
search. Political motivations are probably inher- 
ent in international projects; but they cc uld be 
used to elicit widespread participation and con- 
tinuing commitment. By using enticements such 
as distribution of physical units of the genome 
by political units of the participants; it may be pos- 
sible to guide nationalistic forces into a workable 
international effort. 

An important factor in any international col- 
laborative or cooperative agreement will be the 
participants' domestic organization of human ge- 
nome projects. The agencies involved speak with 
many voiceS; depending on their respective mis- 



sions. Formal collaboration would be difficult to 
negotiate without some domestic coordination (see 
chs. 5 and 6) to harmonize goals. Otherwise, less 
fornial cooperative arrangements will probably 
prevail. 

Even if there emerges no formal international 
organization that can satisfy national and propri- 
etary goalS; the United States could establish an 
international advisory board to solicit suggestions 
and recommendations from the international sci- 
entific community regarding human genome prq- 
ects. pomestic advisory boards could include 
nonvoting members from Europe and Japan. An 
international advisory committee for database 
oversight already exists; it has two members from 
the United StateS; two from Japan; and several 
from Western Europe (14). Members of the com- 
mittee issue recommendations that; although not 
binding; help coordinate ^he various national 
efforts. 

Existing Collaborative Frameworks 

Lack of an international organizational struc- 
ture does not preclude informal collaboration or 
cooperation. Scientific laboratories exchange 
viewS; visitS; and materials as a matter of daily 
practice; many scientists prefer informal network- 
ing to prescribed arrangements and institutions 
(see box 7-E). Policymakers in Europe are finding 
that increasing support for laboratory networks, 
rather than establishing centers; can be an effec- 
tive way to conduct research on a limited bud- 
get. Many of the scientists involved in human ge- 
nome research host visiting foreign scientists and 
graduate students regularly. 

The United States already finances international 
collaboration in biomedical research to a certain 
extent through the normal funding mechanisms 
of the National Institutes of Health; which may 
award grants to U.S. investigators "whose work 
involves substantial collaboration with foreign in- 
stitutions" (63). Researchers affiliated with foreign 
institutions are eligible for grants and contracts; 
in fiscal year 1984; NIH spent $35 million on for- 
eign grantS; roughly half the budget allotted to 
international activities. NIH also gives grants for 
foreign or international conferences and for in- 
ternational research fellowships. 
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Box 7-E. — Large Centers v. Networking 



The development of international sequencing centers draws enthusiastic response from some quarters 
and skepticism from others. Proponents such as Walter Gilbert and Akiyoshi Wada advocate the creation 
of several international centers containing advanced sequencing equipment as the most efficient way to 
sequence the genome, if not to map it. Critics contend that establishing large central institutions reduces 
the innovation spawned by small research laboratories doing investigator-initiated projects. Other critics, 
including many indastrialists, argue against "naive internationalism/' stating that the task at hand should 
be done posthaste, without lengthy delays while international negotiations decide on the division of labor, 
reaponaibilities, and benefits. 

One solutim that could satisfy critics of both stripes is networking— strengthening the links between 
existing laboratories— rather than starting up new research centers. Networking has recently /jained pofiular- 
ity in the European community; indeed, ENckson has written that "the top political priority given to the 
klea that governments should focus their efforts on linking together scientists in existing laboratories— 
rather than on creating major centers or research f acilities^as become perhaps the most important shift 
in Eurcq)ean-level science policy in the 1980s." 

Various research programs sujjported by such organizations as the European Science Foundation and 
the European Economic Community (EEC) have adopted networking strategies in lieu of costlier and more 
contentious decisions to set up central collaborative facilities. The ESF has supported laboratory networks 
for research in areas including polar science and individual psychological development. Particularly rele- 
vant for genome research is a network on the molecular neurobiology of mental illness, in which scientists 
are hunting for pedigrees of families with p'^ychiatric problems in order to locate informative genetic poly- 
morphisms for linkage analysis studies (see ch. 2 and box 7-A). The EEC supports research under the Stimu- 
laticHi Program, providing money to allow scientists from different countries working on the same project 
to meet, perform |oint experiments, and so on. One successful project that the Stimulation Program funded, 
according to Dickson, was "a research program into the development of new high-field magnets, which 
now links together scientists working in 58 research institutions in the 12 member states of the EEC. The 
EEC's Biotechnology Action Program, which encourages a transnational approach to the research it spon- 
sors, has developed a similar networking approach— European Laboratories Without Walls (ELWWs). ELWWs 
link ifidivklual researchers from laboratories in different institutions (preferably in more than one country) 
together for multidisciplinary but focused, precompetitive research projects. The ELWW program empha- 
sizes rapid, open flow of information and material between participants and incorporates joint planning 
and evaluation of the scheduled experiments. 

Perhaps because European laboratories have traditionally been poor at communicating beyond their 
national borders— European scientists are more likely to collaborate or cooperate with American scientists 
than with other Europeans--the networking strategy has met with increasing enthusiasm and has fostered 
notable successes. Wliether the strategy would work to link Europe, Japan, and the United States is not 
certain. Even within Europe there are potential problems. Networking could lead to the support of elite 
research groups and exclude those from poorer countries that do not yet have the facilities to be desirable 
research partners. For projects with potential commercial value, proprietary rights and the open exchange 
of information can become troublesome issues. Dickson reports that some policymakers argue that "the 
relative absence of centralized strategic thinking could turn out to be a major weakness." Despite these 
caveats, networking is a model for international organization that could reduce the anxieties accompanying 
the planning and implementation of international cooperative or collaborative projects, 
iouiicu 
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DOE also eng.iges, to a limited extent, in inter- 
national research cooperation and collaboration 
through its national laboratories. It has been crit- 
icized; however, for earning "a poor reputation 
abroad for long-term commitment to international 
collaborations," which "will make it extremely dif- 
ficult for DOE to attract foreign countries into sig- 
nificant new partnerships" (31). So far, however, 
DOE scientists working on genome projects have 
collaborated freely with researchers from other 
countries (82). 

Existing research organizations can also become 
centers of collaboration. CEPH coordinates over 
40 international investigators and research lab- 
oratories for mapping studies (see box 7-B). It sends 
genetic materials to major gene mapping labora- 
tories around the world; in exchange, the labora- 
tories share their results and data. 

Washin^on Univeroity- 
RIKEN CoUaboration 

A recent agreement between researchers from 
Washington University in St. Louis, Missouri, and 
the Institute of Physical and Chemical Research 
(RIKEN) inTsukuba, Japan, illustrates the poten- 
tial of international collaboration at the level of 
individual institutions (38). The 3-year program, 
effective November 1, 1987, enables researchers 
from Washington University's new Center for 
Genetics in Medicine (founded by a donation from 
philanthropist James McDonnell) to work with re- 
searchers from the Tsukuba Life Sciences Cen- 
ter of RIKEN. The research will combine the ex- 
pertise of the university's scientists in cloning yeast 
cells with the technological know-how of the 
RIKEN scientists, who have developed automated 
DNA analysis equipment. The initial focus of the 
research will be to sequence the entire yeast ge- 
nome and to improve techniques for cloning hu- 
man chromosomes into yeast cells. 

This collaboration, the first bilateral agreement 
between American and Japanese scientists in the 
field of genetics, also provides for information and 
personnel exchanges with the Pasteur Institute 
in Paris and the Academia Sinica in Shanghai, 
China. Data and results from the collaboration will 
be disseminated freely to the international com- 
munity. 
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International Human Gene 
Mapping Workshops 

A series of biannual international gene mapping 
workshops— the ninth (HGM 9) was held in Paris 
in September 1987— has provided a mechanism 
for extensive international interaction. Prior to 
each workshop, committees are appointed for 
each of the human chromosomes. The commit- 
tees are in charge of evaluating the research that 
has been done on the chromosome; they solicit 
papers from the international research commu- 
nity and select the ones to be presented. At the 
workshop, the committee for each particular chro- 
mosome works toward a consensus on which 
mapping data will be accepted as the standard. 
The committees also decide upon the official 
nomenclature for map sites and for probes, and 
their deliberations provide a measure of quality 
control for the research. Data accepted at the 
workshop are submitted to the Human Gene Map- 
ping Library in New Haven, Connecticut, and sub- 
sequendy entered into that database (see app . D). 
In 1987, a new database, Genatlas, was initiated 
specifically for the purpose of managing the map- 
ping data from HGM 9. The conference proceed- 
ings are published in the Journal of Cytogenetics 
and Cell Genetics . Proceedings of some of the con- 
ferences have been independently published. 

The growth in the size of the HGM workshops 
is one indication of the overall growth of the field 
of human genetics. Early conferences attracted 
an exclusive group of participants, but the ninth 
drew hundreds. Data are accumulating so rap- 
idly that biannual conferences may not be suffi- 
cient; plans are already underway for an infor- 
mal workshop, dubbed HGM 9.5, to be held in 
1988 (9). 



International Journals 

The scientific publication process is the most 
important form of data sharing within and across 
national borders— an ongoing form of interna- 
tional cooperation. A bibliometric analysis of the 
international literature showed a rapid rise in the 
number of mapping and sequencing articles pub- 
lished in international journals between 1977 and 
1986 (see figure 7-2). U.S. researchers have con- 
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Rgura 7-2.— Human Qam Mapping and Sequencing 
Artlclee Publiehed Annually 
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A WbHometric analysis conducted for the Office of Technol- 
ogy Assessment by Computer Horizons, Inc. [app. A] showed 
a steady Increase In the total number of articles published 
annually In International journals on human gene mapping, 
gene markers, nucleotide sequences, and related topics from 
1977 through 1986. [See app. E for details on the key words 
used In the literature search.] 

SOURCE: Offico of Tochnotogy AtMMinont, 1068. 

sistently contributed the largest number— from 
38 to 46 percent of all articles with genetic map 
or linkage results (see figure 7-1, table 7-1, and 
app. E) [Computer Horizons, Inc., see app. A]. The 
United Kingdom is the next largest contributor, 
publishing 8 to 11 percent of the articles annu- 
ally, while France and West Germany are next 
with 5 to 8 percent and 2 to 5 percent, respec- 
tively . Japan's share of the basic research has in- 
creased fairly steadily, from 2 percent to 5 per- 
cent of the total . These data show the international 
nature of genome research and of the medical and 
scientific literature in general. There exists some 
segregation of Eastern European journals due to 
restrictions on export of information, and lan- 
guage may pose a barrier for non-English-speaking 
scientists (since many international journals are 
published in English), but for the most part scien- 
tific journals are thoroughly international. Scien- 
tists from one nation freely report data in jour- 
nals from another. 

Databases and Repositories 

The operation of databases and repositories has 
been a standard mode of international coopera- 
tion in many scientiflc fields, and human genome 
projects are no exception; several databases and 
repositories relevant for human genome projects 



exist (see app. D). The cooperative arrangements 
that have evolved among the international data- 
bases for nucleotide sequences and for protein 
sequences are examples of effective international 
collaboration. 

Databases for nucleotide sequences were started 
at Los Alamos National Laboratory Qater funded 
by NIH and operated under the name GenBank^) 
and EMBL (officiaUy dubbed the EMBL Data Li- 
brary) during the late 1970s. By the fall of 1980, 
the database organizers recognized the need for 
collaboration between the two, and from 1980 
through 1982 the databases exchanged sequence 
data on an informal basis until their first major 
releases. In August 1982, GenBank® and EMBL 
held their first joint meeting and agreed to use 
a similar system of accession numbers and to di- 
vide the journals each would scan for data. The 
compatibility of the databases was further en- 
hanced by agreements, reached in 1985 and 1986, 
on common sets of data and annotation. The DNA 
Data Bank of Japan formaUy joined the collabora- 
tion in May 1987. The division of ret^^^nsibili^'as 
for various aspects of the operation of the data- 
bases was formalized in meetings in the summer 
and fall of 1987. 

An international workshop on database needs 
in molecular biology was convened in Heidelberg, 
West Germany, in 1987. The participants recom- 
mended that an international advisory co^nmit- 
tee composed of experts from the fields of molecu- 
lar biology and information sciences be formed 
to provide advice and guidance for expanded co- 
operation among the databases (35). The funding 
agencies that support the databases followed the 
recommendation and appointed a committee, 
which consists of three members from the United 
States, three from Europe, and two from Japan. 
The committee will meet yearly to advise data- 
base staff on matters such as format and annota- 
tion. Its recommendations are not binding, how- 
ever, since each database is responsive primarily 
to the agencies that support it. The first meeting 
was held in February 1988. 

Formal collaboration on protein sequence data- 
bases is more recent. The U.S. database, the Pro- 
tein Identification Resource (formerly known as 
the Dayhoff database, or NBRF), was started in 
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the late 1960s. The European and Japanese coun- 
terparts—the Martinsreid Institute for Protein Se- 
quence Data (MIPS) (27) and the Japan Interna- 
tional Protein Information Database (JIPID)— 
began operations in 1987. The close collaboration 
among the three includes use of the same format, 
the same software, and a regional division of mon - 
itored journals (41). 
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The continued development and maintenance 
of databases and repositories are the most com- 
monly endorsed mode of international coopera- 
tion on human genome projects (see box 7-D). The 
National Academy of Sciences supported the estab- 
lishment of an international organization to gather 
and distribute data and materials in its 1988 re- 
port on human genome mapping (62). 
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Chapter 8 

Technology Transfer 



"The politics of knowledge— the question of who owns and controls the distribution 
and use of scientific information— is by no means a new issue. The pure scientist work- 
ing in an ivory tower has long been extinct. " 

Dorothy Nelkin, Science as Intellectual Property: Who Controls Research^ 
(Washington, DC: American Association for the Advancement of Science, 1984), p 92. 



The economic impact of genome projects will 
depend on how many new products and services 
are created by them. Some large scientific projects 
such as space programs and electronics research 
facilities have been justified by their potential for 
spinning off technologies. The magnitude of such 
spinoffs is unpredictably; however. Often, there 
emerge useful products that could not have been 
foreseen [Meilbron and Kevles, see app. A]. Given 
the many surprises in molecular biology over the 
past decade; it is impossible to predict exactly how 
genome projects v/ill result in products, but they 
undoubtedly will yield many new applications in 
pharmaceuticals, agriculture, and other industrial 
sectors. Uncertainty about the magnitude of eco- 
nomic impact means that genome projects can- 
not be justified purely as an economic investment. 
As the projects go forward for scientific and med- 
ical reasons, however, it makes sense to ensure 
that their results are fully used. The process of 
converting scientific knowledge into useful prod- 
ucts is technology transfer. 

The Federal Government infli' es the effi- 
ciency of technology transfer thro its research 
and development policies. Government has tradi- 
tionally supported research that will have large 
but unmeasurable noneconomic benefits (e.g., re- 
search aimed at improving health as a value in 
itself rather than simply disease impact measured 
In dollars) or that is too risky for individual firms 
to support (e.g., projects that are expensive, highly 
uncertain in outcome, or long-term). Arguments 
for increased Federal support of biomedical re- 
search since World War II have generally empha- 
sized improvements in health. Economic argu- 
ments for increased biomedical research funding 
have typically been anal3'ses of economic drag- 
how much the Nation could save by avoiding dis- 
ability or disease (18). This argument is changing 



to concern for efficient translation of science into 
products. Policymakers are shifting their atten- 
tion to technology transfer as products derived 
from molecular genetics find their way to the mar- 
ketplace, international trade imbalances worsen, 
and rising deficits intensify scrutiny of Federal 
budgets. 

A major effort is underway in many developed 
and some developing nations to target biotech- 
nology for investment because it is considered pai - 
ticularly likely to produce economic benefits 
(3,16,19,23). Most foreign governments' efforts to 
promote biotechnology include strategic planning 
of national research programs and encourage- 
ment of research and development in private firms 
(e.g., tax incentives, subsidies for industrial re- 
search centers, business grants, or government 
risk capital). The United States has no deliberate 
Federal policy to encourage biotechnology per se 
(16,19,23), although legislation introduced late in 
the first session of the lOOth Congress would cre- 
ate a national biotechnology policy board. 

Most genome projects could produce both di- 
rect and indirect economic benefits. Some projects 
are expected to yield directly marketable prod- 
ucts (e.g., DNA sequenators, analj^cal instru- 
ments, DNA probes for diagnostic tests). Others 
would accelerate development of products (e.g., 
maps, repositories, and databases). 

Different groups have divergent concerns about 
technology transfer. Scientists fear that corporate 
participation will inhibit the free flow of informa- 
tion and impede scientific progress. Polic}miaker8 
want to ensure that a large Federal investment 
in genome projects is translated efficiently into 
new products and services, ultimately creating 
new jobs and other economic benefits. They are 
wary of projects in which U.S. taxpayers will fund 



ERIC 



165 



166 



research that is commercialized and used by for- 
eign interests. In this view, foreign governments 
should support an equitable fraction of basic re- 
search, and American investments should not al- 
low jobs and profits to migrate abroad. Industrial 
representatives want a say in planning research 
programs and access to scientific results as they 
are produced. Individual companies wish to en- 
sure that any funds they invest will earn suffi- 
cient returns. 

Congress could encou»*age technology transfer 
by funding personnel exchange among govern - 



ment; academic, and industrial sectors, udth min- 
imal bureaucratic strictures, and by supporting 
symposia, journals, and other modes of informa- 
tion exchange. When advisory committees are 
formed to guide Federal genome projects, indus- 
trial representatives couid ensure that projects 
are planned with an eye to economic exploitaiion. 
These options are covered in chapter 6. The re- 
maining options relate to protection of inventions 
resulting from federally funded research, dis- 
cussed below. 



PATENT AlVD COPYRIGHT POLICIES 



Ideas and know-how— intellectual property— 
are granted many of the same legal protections 
as taingible private property. Intellectual property 
law traces its roots directly to the U.S. Constitu- 
tion, which authorizes the Federal Government 
"to promote the Progress of Science and the use- 
ful Arts, by securing Tor limited times to their 
Authors and Inventors the exclusive Right to their 
respective Writings and Discoveries." The purpose 
of intellectual property protection is to encourage 
inventors and discoverers to share their knowl- 
edge, while ensuring that ; ley benefit from the 
fruits of their labors. Legal protections balance 
the social good stemming from wide disclosure 
of new knowledge against individuals' or compa- 
nies' rights to gain from what would not have ex- 
isted without their efforts. 

Three types of intellectual property protection 
are relevant to discussion of the technologies likely 
to emerge from genome projects: patents, copy- 
rights, and trade secrets. 

Patents 

Patents grant inventors the right to exclude 
others from producing, using, or marketing their 
inventions (as defined in the patent claims) for 
a specified period. The purpose of patent law is 
to give inventors an incentive to risk their time 
and money in research and development, while 
requiring public disclosure. Patent laws in differ- 
ent countries vary in degree of protection, en- 
forcement, penalties for violation, and cpteria for 



approval. In the United States, the period of pro- 
tection ''^ 7 years, with extensions for pharma- 
ceuticaL lO cover some of the delay imposed by 
regulation. Patents apply to inventions, but not 
to ideas, mathematical formulas, or discoveries 
of preexisting things. A patentable invention must 
be new, useful, and not obvious. A patent holder 
can permit others to use or make the invention 
by licensing it. 

Profit is only one of many motivations for patent- 
ing an invention. Another is to maintain control 
over it. Leo Szilaid filed & patent on the process 
of nuclear fission, for example, hoping to bring 
it to the attention of military authorities in the 
United States and Great Britain (12). Cyclotrons 
used in nuclear physics were patented to ensure 
their pror*"r medical applications, yet this did not 
inhibit research (in fa^t, most physicists were not 
even aware of the patents) [Heilbron and Kevles, 
see app . A]. The Rockefeller Institute patented the 
sphygmomanometer (blood pressure cuff) to en- 
sure that clinicians would have ready access to 
it and that later discoverers could not limit its use 
(11). Nonprofit organizations supporting genome 
projects are likely to encourage patents when they 
would ensure broad public use (9). 

U.S. patents are obtained from the Patent and 
Trademark Office in the Department of Com- 
merce (other rations have analogous institutions). 
The patentability of inventions is initially deter- 
mined by this office. The scope of protection and 
more refined factors for granting patents are de- 
fined by case law, when patents are challenged 
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in court. The principles for determining patenta- 
bility do not depend on any particular type of tech- 
nology; but interpretation of them does. Uncer- 
tainty about the patentability of inventions is 
greater in biotechnology than in most other areas 
because the techniques are new and complex. In- 
terpretation of criteria for granting patents, defin- 
ing the scope of patent claimS; and determining 
what constitutes infringement have not been clar- 
ified by case laW; because the case law does not 
yet exist. 

Patent Policies for Federally 
Funded Research 

Uncertainty about patentabiUty need not para- 
lyze research efforts, because interpretation of 
paten: law does not interfere with most federally 
supported research (as explained below). Patent 
policies of Federal agencies will nonetheless in- 
fluence how genome research is commercialized, 
and these patent policies have chc.nged dramati- 
cally over the past decade. The changes are in- 
tended to promote commercial application of fed- 
erally funded research by permitting private 
ownership and control of its results. The reason- 
ing is that research will be more broadly dissemi- 
nated and effectively used if those who conduct 
it are granted title to the patents on resulting in- 
ventions, thus providing an incentive to commer- 
cialize the inventions [Rosenfeld, see app. A]. 

Changes in patent policy resulted from studies 
sho ving that, while the Federal Government held 
title to roughly 28,000 inventions in 1975, fewer 
than 5 percent had been licensed to businesses 
(15). The Patent and Trademarks Amendments 
of 1980 (Public Law 96-517) were passed to grant 
title to small busines'^ps and nonprofit organiza- 
tions funded to do rese^ rch by the Federal Gov- 
ernment. These were further amended in the 
Trademark Clarification Act of 1984 (Public Law 
98-620), most significantly by removing restric- 
tions on bcensing Regulations implementing these 
laws were made final by the Department of 
Commerce in March 1987. 

The poUcies applying to small businesses and 
nonprofit organizations were extended to large 
businesses, with some exceptions, by a memoran- 
dum from President Ronald Reagan dated Febru- 
ary 18, 1983. The Technology Transfer Act of 1986 
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(Public Law 99-502) permitted new licensing and 
joint venture arrangements, and granted agen- 
cies authority to form consortia with private con- 
cerns. Executive Order 12591, issued by President 
Reagan in April 1987 encouraged technology 
transfer of federally funded research. The order 
was based on existing statutes and promoted con- 
sortium formation, exchange of research person- 
nel between government laboratories and indus- 
trial firms, special technology transfer programs 
at federally owned laboratories, and transfer of 
patent rights to government grantees and con- 
tractors. 

The policies er.ibodied in these statutes, regu- 
lations, and executive orders constrain the author- 
ity of Federal agencies to force sharing of data 
if sharing would conflict with recipients' taking 
title to inventions [Rosenfeld, see app. A]. The gov- 
ernment c?;i override recipients and take title to 
patents only in special situations. One of these is 
when an agency determines that retaining title 
"will better promote the policy and objectives" of 
the patent statutes. This clause has been narrowly 
interpreted and has rarely been used by the re- 
search agencies involved in genome projects 
[Rosenfeld, see app. A]. The Federal Government 
can also impose licensing requirements to "allevi- 
ate health and safety needs," "meet requirements 
for public use specified by Federal regulations," 
or meet "certain statutory provisions requiring 
products to be manufactured in the United States.'' 
These provisions have also been narrowly inter- 
preted and impose such a daunting burden of 
proof on agencies that they are unlikely to be used. 
They could conceivably be invoked if patent rights 
interfered with the pooling of data that must be 
collectivij to be useful or if clinical benefits were 
delayed te.g., slow commercialization of genetic 
tests or therapies), but only if problems were se- 
vere and obvious. 

Federal research agencies patent policies need 
not unduly slow exchange of information. The de- 
gree to which information flow is impeded will 
depend on when grant recipients and contractors 
file patent appUcations. Many genome projects will 
result in patentable inventions, particularly those 
focused on technology development. Recipients 
of Federal funds may follow one of three courses 
of action: file applications early and subsequently 
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release data; file early and do not take extra ac- 
tions to release data (relying on the patent proc- 
ess to do so); or decide not to patent. 

Filing patent applications early and publishing 
data soon thereafter are optimal for encouraging 
rapid dissemination of knowledge, protecting in- 
ventors' rights, and preserving economic bene- 
fits in the United States. Early patenting and sub- 
sequent disclosure would release data for public 
use but would help inventors maintain control of 
their inventions and assure them and their spon- 
soring institutions of any financial rewards. Early 
patent application would also protect the Nation 
because statutes give a preference to U.S. manu- 
facture of any resulting products or services. Early 
patent application not followed by special efforts 
to disseminate data would ensure benefits for the 
grant recipient or contractor but would needlessly 
delay exchange of useful information— patents are 
often not published for several years, and it has 
taken over 7 years for some biotechnology patents 
to be awarded. 

Investigators mpy decide not to apply for a pat- 
ent because they wish to avoid substantial legal 
costs and bureaucratic entanglements or because 
they believe that science should not become com- 
mercially oriented. This can make new methods 
freely available to all, but it can also inhibit full 
exploitation of an invention. It is also against the 
intent of Federal statutes, which require recipi- 
ents of Federal funds to report patentable inven- 
tions. An inventor can lose control of an inven- 
tion if he or she does not file a patent and another 
inventor does so. A product or process that is not 
patented is unlikely to be used commercially, be- 
cause any firm investing in manufacture will want 
a guarantee that its investment will be protected. 
Failure to patent also invites foreign exploitation 
of research funded at U.S. taxpayers' expense: Pat- 
ent rights could be claimed by a foreign company, 
research institution, or individual; U.S. firms 
would not be given manufacturing preference; 
and the U.S. inventor could be prevented from 
use of the invention. Export of economic benefits 
has occurred frequently in biological sciences 
when initial discoveries have not been patented. 
Penicillin was discovered in England, for exam- 
ple, but the patent was obtained by U.S. corpora- 
tions. The cell fusion process for making mono- 



clonal antibodies was developed in London, but 
many of its applications were exploited first in 
the United States. In both cases, the United King- 
dom claimed the Nobel Priz«, but the UnittKi States 
reaped most of the economic benefits. 

Federal agencies and Congress may wish to over- 
see patent practices of grantees and contractors 
closely to ensure that patents are filed early and 
data exchanged soon thereafter Disclosure of data 
should not be long delayed by policies designed 
to encourage patenting of inventions, because data 
per se are not inventions eligible for patent pro- 
tection. There is a gray area, however, between 
invention of new methods and the data that re- 
sult from using thbm. 

Scientists may be reticent to disclose details of 
methods used to generate data if doing so en- 
dangers patentability. An invention must be novel 
to be patented: that is, it must not be widely used 
by parties other than the inventor for more than 
one year, and publication of the method cannot 
precede filing the patent by more than one year. 
(Some foreign countries do not permit even the 
one-year grace period.) If investigators are uncer- 
tain whether disclosing details of method would 
threaten a patent, they may choosr not to pub- 
lish those details. Uncertainty over |.atentabilitj/ 
can indeed inhibit the free exchange of informa- 
tion. 2t has led one commentator to list three pos- 
sible ways of altering patent laws: 1) making the 
definition of novelty more flexible; 2) establish- 
ing an intellectual propertj protection that is anal- 
ogous to but more limited than patents and that 
requires less rigorous proof of novelty and nonob- 
viousness; or 3) legislating special intellectual prop- 
erty protections for biotechnology (8). Further 
study is needed "to determine whether and how 
biotechnology demands special treatment as in- 
tellectual property before legislative reform will 
be in order" (8). This suggests that patent policies 
might be high on the agenda for congressional 
oversight but low on the legislative calendar. 

Filing patents early and then disclosing the re- 
sults could worsen an already considerable back- 
log of pending patents. Approximately 7,000 bio- 
technology patents have been filed at the Patent 
and Trademark Office and await final action (20). 
If the benefits of patent protection are judged im- 
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portant by Congress; then one option would be 
to incrc the resources in the biotechnology cec- 
tions of the Patent and Trademark Office. This 
could include higher salaries, more opportunities 
for training to keep abreast of technological de- 
velopments; easier access to technical databases, 
and more examiners. Increased resources could 
not only reduce uncertainty by diminishing the 
backlog of pending patents, but also increase the 
attention devoted to each application and reduce 
subsequent litigation. 

Patent Policies at Research Agencies 

The Department of Commerce recently promul- 
gated final regulations for Federal agencies to use 
when funding research at small businesses, 
universities, and nonprofit organizations (37 CFR 
§§401]. While these regulations, issued in March 
1987, have had little time to take effect, the Na- 
tional Science Foundation (NSF) and the National 
Institutes of Health (NIH) have followed similar 
policies since the late 1970s. 

The General Accounting Office found that uni- 
versity administrators, industry representatives, 
and small businesses all reported a ''significant 
positive impact on research and innovation" from 
taking title to inventions that resulted from fed- 
erally funded research. University and industry 
officials also reported benefits from the 1984 law 
that removed licensing restrictions (15). Agencies 
likewise reported a generally positive assessment, 
with greater potential for licensing patents than 
whv?n title was retained by the Federal Gov- 
ernment. 

The situation at the Department of Energy (DOE) 
is more complex. A substantial fraction of DOE 
research funding goes to national laboratories, 
which are ovsoied by the Federal Government and 
operated by private contractors. At most of the 
laboratories, the contractor can elect to take title 
to inventions. Title rigl is are restricted, however, 
at facilities that conduct research on weapons and 
naval propulsion systems. This could prove rele- 
vant to genome projects because several of the 
groups that have been directly engaged in DOE's 
Human Genome Initiative are located at labora* 
tones with restricted title policies— namely, Law- 
rence Livermore National Laboratory and Los 
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Alamos National Laboratory, both operated by the 
University of California. Regulations state that limi- 
tations on tho contractor's right to take title should 
be restricted to "inventions occurring under" na- 
val nuclear propulsion or weapons-related pro- 
grams [37 CFR Part 401.3(a)(4)]. This should per- 
mit the contractor to take title to inventions from 
human genome projects because the projects 
would not be conducted under restricted pro- 
grams, even at the affected laboratories. Negotia- 
tions between DOE and contractors are more com- 
plicated, however, when restrictions differ among 
programs at the same facility. Legislation has been 
proposed to mmdate patent policies for genome 
projects at the national laboratories; the policies 
would be modeled on those of other research 
agencies. 

The regulations and executive orders imple- 
menting patent policies at research agencies are 
quite recent. It would be premature to alter those 
policies fundamentally until the results of current 
law can be assessed (with the possible exception 
of DOE policies regarding national labc^atories, 
noted above). 

There are additional roles for Congress. First, 
Congress could monitor the practices of Federal 
agencies and funding recipients to ensure that the 
intent of existing statutes is carried out. Second, 
Congress could increase resources to the Patent 
and Trademark Office to enable more efficient 
processing of patents. Third, Congress could in- 
crease resources for universities and other recip- 
ients in order to manage patent filing in the United 
States and abroad. Finally, Congress could ask 
agencies engaged in genome projects to specify 
their patent policies more clearly. At present, writ- 
ten material on patent policies at NIH, DOE, and 
NSF is difficult to obtain, and there is no single 
source for information on patent policies at all 
agencies involved in genome projects (Rosenfeld, 
see app. A]. The interagency nature of genome 
projects means that recipient institutions will often 
be funded by more than one agency. A clear pres- 
entation of patent guidelines at various agencies, 
with explanations of the advantages of early pat- 
ent filing and the implications of doing so (and 
not doing so), might diminish confusion and pro- 
ipote commercial application. 
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Copyrights 

Copyright law is intended to protect works of 
authorship. It has traditionally been applied to 
works of art, books, and articles but has had to 
adapt to technological change. Copyrights now 
extend to computer software and electronic en- 
tertainment media, for example (6,17). Copyright 
is intended to protect the expression of ideas, not 
the idea^ themselves— a difficult but crucial dis- 
tinction. 

The Copyright Act of 1976 is the most recent 
statute relevant to genome projects, extending pro- 
tections to nontraditional media such as computer 
software. The extensions may also prove relevant 
for research in molecular biology (6). Case law 
has evolved doctrines to test the distinction be- 
tween idea and expression and to define the scope 
of protection. An author can prohibit others from 
copying his or her book, for example, but the con- 
cepts and methods described in the book are not 
protected. Arguments have been made that copy- 
right could apply to DNA (6), but this line of argu- 
ment is not widely accepted and the scope of pro- 
tection (if it exists) is quite narrow (5). The ability 
to copyright a native DNA sequence derived from 
a human chromosome or other natural source is 
particularly uncertain (5). A preliminary commu- 
nication from the Copyright Registration Office 
indicates that such sequences would not be ac- 
cepted, although the book or printed map con- 
taining them— the particular expression of map 
or sequence data— would (10). 

Even if DNA maps can be copyrighted, such 
copyrights are unlikely to inhibit research sub- 
stantially. In normal circumstances, obtaining a 
copyright does not require extra time and is thus 
not a justification for delaying disclosure of re- 
sults. A company could charge for access to map 
or sequence information in much the same way 
that commercial databases charge for informa- 
tion sharing. Access and service charges are not 
new— molecular biologists routinely oay for serv- 
ices that are less expensively or more rapidly per- 
formed by others. They buy copyrighted books 
and read copyrighted journals. Many materials 
used in biological research (clones, enzymes, 
chemicals) can be made by individual investiga- 
tors, but it is easier to purchase such materials 
from a company set up to make them. 



The type of research conducted by a private 
company engaged in mapping and sequencing 
DNA would be feasible in a large number of lab- 
oratories. Copyrights would not prevent investi- 
gators from using information published or other- 
wise provided by a company or from duplicating 
the work. A company that has developed exten- 
sive map and sequence information would either 
charge so little that it is cheaper ^ " , researcher 
to obtain it from the company than to do the work, 
or the researcher would in fact repeat the work. 
In either case, the community of researchers is 
no worse off than if the company had not mapped 
or sequenced. 

If copyright practices prove to impede research, 
then agencies can take steps to correct the defi- 
ciencies. Agencies have much broader discretion 
for copyright policies than for patents (Rosenfeld, 
see app. A]. 

Trade Secrets 

Information held by cne company that is use- 
ful in its business and unavailable to competitors 
is called a trade secret. Trade secr*^ts can be pro- 
tected from misappropriation— that is, improper 
disclosure— through the courts, which award 
monetary damages for unauthorized use. A trade 
secret must be in continual use, be well established 
in practice, and have actual or potential commer- 
cial value (19). The holder must take steps to guard 
it. Trade secrets do not involve slow and costly 
legal steps for registration, their duration is not 
limited by law, and they need not meet patent or 
copyright criteria. Uncertainties about patents and 
copyrights are not relevant (although legal criteria 
f^r protection under trade secret laws must be 
luet). Trade secret protections are principally se- 
cured under State rather than Federal laws, and 
there is some variation among the States. Trade 
secrets have limited scope: In a rapidly moving 
field they may not last long. Trade secret laws 
cannot ensure returns on a research investment 
i*" another inventor discovers the secret method 
or finds a new way to do the same thing. Protec- 
tion does not apply, even if competitors figure out 
the secret by examining a product (reverse engi- 
neering). Most important, trade secrets must be 
kept secret. This would be quite difficult to justify 
for federally funded research. 
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The scientific equivalent of a trade secret is non- 
disclosure. This is referred to pejoratively as sit- 
ting on data and is widely viewed as improper 
beyond the period needed to confirm accuracy 
of results and take advantage of a lead for fur- 
ther research. The period of nondisclosure varies 
widely among researchers, even those in the same 
field. Researchers who share data and materials 
early and freely are widely praised, such as the 
many collaborators who worked to find the mus- 
cular dystrophy gene (see ch. 3). But nondisclo- 
sure—for a few months to a year— is not uncom- 
mon in order to maintain a research advantage 
or to establish first discovery, even in research 
leading to Nobel Prizes (or perhaps especially in 
such research) (21,22). Permanent nondisclosure 
of an important result is, however, inimical to the 
purpose of scientific inquiry— the discovery and 
dissemination of new knowledge 

Nondisclosure is of particular concern when the 
results must be pooled in order to be useful (e.g., 
maps derived from data contributed by various 
groups). The need for pooled data can create a 
situation known as the prisoner's dilemma: when 
cooperation of aU parties yields the maximum ben- 
efits, but one party can benefit if he does not co- 
operate and the others do. (So called because 
prisoners planning a jail break all benefit from 
cooperation, but one stooge can benefit individu- 
ally by telling the guard of the plans.) An investi- 
gator searching for the location of an unknown 
ge le stands to gain if other groups with markers 
make them freely available but he does not. He 
can then use both his and others' work to speed 
the search, while denying others access to his mar- 
kers. Similar situations will arise in connection 
with submiumg information to databases, send- 
ing materials to other researchers or central re- 
positories, and other cases directly related to ge- 
nome projects. Agencies will need to monitor the 
free exchange of data and materials, particularly 
when the efforts must be collective, and take steps 
to correct inequities. The need for joint efforts 
highlights the importance and fragility of col- 
laborative institutions such as the Center for the 
Study of Human Polymorphism (CEFH) (see ch. 7). 

Many journals have either explicit or unwrit- 
ten policies that research data and materials de- 
scribed in an article must be made available to 
other researchers at the time of publication. Re- 



searchers preserve their option for exclusive use 
from the time of discovery until publication. Many 
scientists make materials available even before 
publication, which can require many months. 
Linking availability of materials to publication is 
a powerful mechanism, because one measure of 
scientific prestige is priority— who discovered 
something first. Priority is generally determined 
by date of publication. In large collaborative sci- 
entific projects, mechanisms have evolved to per- 
mit scientists time to pursue hot research leads 
while ensuring that others gain fair access. (CEPH's 
policy of sharing one set of data only among col- 
laborators and making another set publicly avail- 
able is au example.) 

An infonnal policy of disclosure operates in Fed- 
eral agencies through the process of peer review. 
If a researcher is known to hoard data— and such 
information spreads rapidly through scientific 
commurities— then proposals submitted b; that 
individual are unlikely to be given high priority 
by study sections (7). Review groups withhold sup- 
port from research whose results thdy cannot see. 
This mechanism is slow— it can only be used when 
a grant is up for renewal, every 3 years or more— 
but it can be quite effective. If further measures 
are needed. Federal agencies could require sub- 
mission of materials and data— map positions or 
DNA sequence data, for example— to the appro- 
priate database. Such a policy would not be eas- 
ily enforceable, however, and would be con- 
strained by investigators patent rights. Some 
journals now require submission of DNA se- 
quences in proper form to GenBank® or its sister 
database in Europe at the time a paper is accepted. 
Agencies could devise incentives to make contri- 
bution of data and materials attractive, an alter- 
native that is ir.oro easily implemented and less 
politically troublesome than negative sanctions. 
Those submitting data to CEPH, for example, ben- 
efit from knowing the position of their markers 
relative to mcTlicr*^ found Dy others. Persons man- 
aging the lJi\A sequence databases have contem- 
plated giving researchers a similar incentive. 

Federal agencies have substantial power to re- 
quire disclosure when it does not impede gran- 
tees' and contractors' intellectual property rights. 
Grant recipients and contractors need ample time 
to file patent applications, but legal protections 



EMC 



172 



of intellectual property are unlikely to inhibit when broad access to data is necessary to fulfill 
agency policies promoting disclosure^ particularly the agency's mission. 



INTERNATIONAL TECHNOLOGY TRANSFER 



Human gene mapping is inherently international 
in scope. Recent oreakthroughs in assembling 
rough genetic maps; for example; have depended 
on an international collaboration of investigators 
from Europe; North America; and Africa using 
family data from four continents. Several current 
technologies for sequencing and physical mapping 
were developed in the United Kingdom and other 
European nations; not the United States; however; 
recent years have seen r icreased emphasis on re- 
taining the economic benefits of federally funded 
research for the United States. 

International technology transfer is the move- 
ment of inventions and know-how across national 
borders. Concerns about international technology 
transfer fall into four areas: economic benefits; 
humanitarian and scientific benefits, national pres- 
tige; and military applications. 

Economic Benefits 

Concerns about economic implications of inter- 
national technology transfer focus primarily on 
the export of jobs and services generated by re- 
search funded at public expense. Policies to com- 
bat this fall into three main areas: patent policies; 
restrictions on flow of information and materi- 
als, and promotion of jomestic technology trans- 
fer so that benefits remain within national 
borders. 

The patent policies described above have sev- 
eral provisions on international technology trans- 
fer that are relevant to genome projects. For for- 
eign recipients of Federal funds or those subject 
to a foreign government; agencies must consider 
whether the recipient's government or company 
enters into international cooperative funding 
agreements on a "comparable basis" and whethei 
the recipient's government protects U.S. intellec- 
tual property rights [Executive Order 12591; Apr. 
10, 1987]. Recipients of Federal R&D funds must 
ensure that the products of the invention will be 



"manufactured substantially in the United States" 
[35 U.S.C. §§204]. Since jobs and economic wealth 
are linked more tightly to manufacturing than to 
initial research and development; even foreign- 
held U.S. patents resulting from Federal funding 
would have economic benefits in the United States. 
Moreover; Federal agencies are not required to 
grant patent rights to foreign recipients or those 
subject to control of ? foreign goyernment; even 
if they are universities or nonprofit organizations 
[37 CFR 401.14(a)(1)]. Foreign recipients are thus 
managed differently than their U.S. counterparts. 
Agencies could conceivably require foreign recip- 
ients to assign title to the U.S. Government or re- 
quire that U.S. research partners take title. 

Exploiting federally funded research inventions 
abroad will usually entail seeking foreign patents. 
Several international conventions govern patents; 
but conditions for granting patents differ among 
nations. The United States permits a grace period 
of one year from the date of publication to file 
a patent application; for example; but many other 
governments do not. If investigators wish to en- 
sure worldwide patentability; therefore; they must 
file foreign patents before publication. The period 
of patent protection also differs. Researcher in- 
stitutions accepting Federal funds must know 
about these and other differences when making 
decisions about foreign patents. Disseminating 
knowledge about such differences could be en- 
couraged by research agencies in concert with 
the Department of Commerce. Agencies could also 
encourage institutions receiving Federal funds to 
pursue foreign patents. 

The current necessity for filing patents individu- 
ally in many countries is expensive and wasteful 
for all nations. International patent policies have 
been discussed several times at meetings of the 
Organization ^or Economic Cooperation and De- 
velopment. Attempts are being made to harmonize 
international practices (14). 
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Humanitarian and Scientific 
Benefits 

The humanitarian and scientific benefits of ge- 
nome projects will be great. The United States has 
consistently performed a significantly higher frac- 
tion of the total mapping and sequencing effort 
than any other nation (see ch. 7). The knowledge 
resulting from these efforts has been freely shared 
with the rest of the world, to the benefit of citizens 
of all nations. The scientific knowledge generated 
at Federal exi)ense since World War II may well 
prove to be one of the most significant interna- 
tional contributions of modem American culture. 

Imposing restrictions on the flow of informa- 
tion and scientific materials from U.S. research- 
ers to researchers abroad would be politically 
troublesome and technically difficult. Details of 
what to share and what to restrict would be diffi- 
cult to describe in advance, and policies restrict- 
ing the flow of data are against scientific traditions, 
which transcend national borders. Withholding 
map locations and DNA sequence information 
would be a violation of scientific ideals, particu- 
larly when such information could be clinically 
useful. Unilateral restrictions imposed by the 
United States would invite reciprocation, to the 
detriment of worldwide scientific progress. 

The same tradition of free international ex- 
change does not necessarily apply to the exchange 
of services and products— for example, mapping 
services, instruments, automation equipment, and 
reagents— which is governed more by interna- 
tional trade agreements than by scientific prac- 
tices. Many national £ ^vernments wish lo assist 
their companies in developing goods and services 
for export. Genome projects focused on technol- 
ogy development are likely to be seen in this light. 
Nationalistic economic policies make projects to 
develop instruments or other salable goods poor 
candidates for international cooperation. Euro- 
pean nations may be exceptions, because they 
have a basis for cooperation through several bio- 
technology programs of the European Economic 
Community. 

Restrictions on international exchange of sci- 
entific personnel would disrupt many molecular 



biology laboratories in the United States and 
abroad. The United Slates has often reaped the 
benefits of international scientific exchange. Sen- 
ior scientists, postdoctoral fellows, and graduate 
students from other nations work in U.S. labora- 
tories and attend conferences. In exchange, U.S. 
scientists visit and are occasionally educated at 
universities and research centers abroad (12,22). 
The team of scientists that developed the atomic 
bomb for the U.S. Army, for example, was heav- 
ily dependent on scientists trained in Europe (12). 
Molecular biologists from abroad have often set- 
tled in the United States because it is so coi.ducive 
to scientific research; several Nobel laureates at 
American universities Immigrated during their sci- 
entific careers. Many projects in molecular biol- 
ogy have depended heavily on foreign scientists 
working in the United States, and many of the 
best stay or eventually return (4). The United 
States may in fact benefit from international per- 
sonnel exchanges more than it is hurt by them. 
The Federal Government could nonetheless limit 
funding of foreign researchers at U.S. institutions, 
although this would probably generate ill will and 
provoke reciprocal actions by other governments. 

One of the problems in assessing the potential 
impact of policies to reduce funding of foreign 
researchers in American laboratories is the ab- 
sence of information about their research careers. 
If most foreign researchers remain in the United 
States or are particularly productive investigators 
while receiving Federal funds, then policies to re- 
strict their ingress would be counterproductive. 

Extending current restrictions or use of Fed- 
eral funds for American researche ^s to travel 
abroad v/ould be even less politically acceptable 
and more diffi .jlt to implement. It would result 
in direct loss of information to the United States, 
because persons traveling abroad are as likely to 
import information from their foreign collabora- 
tors as to export ^t. Policies designed to inhibit 
the exchange of prersonnel, materials, and infor- 
mation across national borders threaten benefits 
but gain little for the United States. 

Promoting domestic exploitation and foreign 
patenting of new technologies is a more positive 
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and less poKtically troublesome means to the same 
end of improving U.S. economic competitiveness. 
Such policies can preserve the U.S. lead in research 
without provoking retaliation or tarnishing the 
country's prestige. 

National Prestige 

One argument for Federal sponSvirship for ge- 
nome projects is that they are highly conspicu- 
ous and beneficial: Other nations will do the work 
if the United States does not, to the detriment of 
U.S. prestige. Similar arguments have bt n prof- 
fered for the supersonic transport, space pro- 
grams, and other technical projects. These argu- 
ments tie the stature of U.S. science and 
technology to leadership of genome projects. The 
international prestige attached to genome projects 
is a purely political judgment; it cannot be assessed 
technically. 

What would be the consequences if Japan or 
a European nation were to have the first com- 
plete set of ordered DNA clones representing all 
human chromosomes, or the first reference se- 
quence of the human genome? Such questions are 
best answered primarily by the scientific and tech- 
nical merits of the projects not by an appeal to 
a vague notion of national prestige. If projects are 
technically unsound or uneconomical, then the 
United States would not benefit from a commit- 
ment to them. Other countries could do so, but 



they would only be nurting themselves. If the 
projects are technically sound, then the United 
States would do well to lead or at least partici- 
pate in them, but national prestige would not be 
the principal Justification for involvement. National 
prestige is not a useful basis for judging major 
scientific or technical projects. 

Military Applications 

Military applications of results of genome 
projects should not prove to be a major consider- 
ation in technology transfer. U.S. policies ban the 
export of goods and technologies that could be 
used for military purposes by specified hostile 
countries. Such policies are administered by the 
Department of Commerce in consultation with the 
Department of Defense and the Department of 
State. The export of some goods produced using 
biological technologies could be affected (1). At 
present, however, DNA mapping, sequencing, and 
other means of analysis relevant to genome 
projects are not on the list of controlled technol- 
ogies, and this should remain true for the fore- 
seeable future (2). The main reason is that the tech- 
nologies and data resulting from genome projects 
would not have immediate military applications. 
Like other technologies and data, some could con- 
ceivably be used for a military purpose, such as 
devising vaccines against biological warfare 
agents, but genome projects would not in them- 
selves promote biological warfare. 
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Appendix A 

Topics of OTA Contract Reports 



The foilr^M mg reports *vere prepared by outside con- 
tractors for the Office of Technology Assessment for 
this assessment. They are available on microfiche or 
as hard copy from the National Technical Information 
Service (NTIS), 5285 Port Royal Road, Springfield, VA 
22161; tel: (703) 487-4650. 

• Mapping Our Genes Contractor Reports, Vol . 

1, Order No. PB 88 160 783/AS 

-Bibhometric Analysis of Work on Human Gene 
Mapping, Samuel R. Reisher and Michael B. Albert, 
CHI Research/Computer Horizons; Inc. 

—Medical Implications of Extensive Physical and Se - 
quence Characterization of the Human Genome, 
Theodore Friedmann, Center fo. Molecular Ge- 
netics, School of Medicine, University of Califor- 
nia, San Diego 

—Mapping the Human Genome: Some Ethical Im- 
plications, Jonathan Glover, New College, Oxford 
University 

—Mapping and Sequencing the Human Genome: 
Considerations From the History of Particle Ac- 
celerators, John L. Heilbron, University of Cali- 
fornia, Berkeley, and Daniel J. Kevles, California 
Institute of Technology 

—Mapping die Hun.an Genome: Historical Back- 
ground, Horace Freeland Judson, The Johns Hop- 
kins University 

—Long-Term Implications of Mapping and Sequenc- 
ing the Human Genome: Ethical and Philosophi- 
cal Implications, Mark A. Lappe, College of Medi- 
cine, University of Illinois at Chicago 
• Mapping Our Genes Contractor Report^; Vol. 

2, Order No. PB 88-182 805/AS 

—The Mapping and Sequencing of Genomes: A Com - 
parativd Analysis of Methorls, Benefits and Dis- 



benefits, Stephen M. Mount, Department of Bio- 
logical Sciences, Columbia University 

—Mapping the Human Genome: Experimental Ap- 
proaches for Cloning and Ordering DNA Frag- 
ments, Richard M. Myers, Department of Physiol- 
ogy, University of California, San Francisco 

--Mapping and Sequencing the Human Genome in 
Europe, Peter A. Newmark, Nature 

—Application of Human Genome Mapping for the 
GlobaJ Control of Genetic Disease, Sir David J. 
Weatherall; Nuffisld Department of Clinical Medi- 
cine, John Radcliffe Hospital, Oxford University 

- Search of the "UlUmate Map" of the Human Ge- 
nome: The Japanese Efforts, Akihiro Yoshikawa, 
Berkeley Roundtable on the International Econ- 
omy, University of California 

OTA also sponsored two workshops during this 
assessment, and the transcripts of those workshops 
have been submitted to NTIS as: 

• Mapping Our Genes, Transcript of Work* 
shop "Issues of GoUalxiration for Human Ge- 
nome Projects," June 26, 1987, Order No. PB 88- 
162 797/AS; and , 

• Mapping Our Genes, Transcript of Work* 
shop ''CkMts of Human Genome Projector" 
Aug. 7, 1987, Order No. PB 88-162 813/AS. 

The following report was written for OTA but was 
not sent to NTIS because it will be available to th e public 
through a law journal article: 

• "Scaring of Research Results in a Federally Spon- 
sored Gene Mapping Project/' Susan Rosenfeld, Sci- 
ence and the Law Committee, Association of the Bar 
of the City of New York, August 1987; to be pub- 
lished by the Rutgers Computer and Technology 
Law Journal, vol. 14, No. 2, 1988. 
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endix B 

Estimated Costs of Human 
Genome Projects 



Congress has primary responsibility for funding re- 
search through Federal agencies because of its respon- 
sibility for the national budget each year. Appropriat- 
ing Federal funds for any special genome projects will 
therefore fall to Congress. These appropriations will 
express Congress' judgment regarding the relative 
value of genome projects. In setting appropriation 
levels, Congress will weigh the costs of the programs 
against their anticipated benefits (in economic and so 
cial terms) and will balance the value of proceeding, 
against the costs of not doing so, as measured in lost 
benefits or opportunities. 

Proposals for genome projects are intended to sup- 
port research, but research needs are inherently un- 
predictable: Technological breakthroughs could dra- 
matically diminish budget needs, and unanticipated 
obstacles could just as dramatically increase them. Esti- 
mates of near -term projects using existing technologies 
are necessarily more accurate than future projects that 
presume technological developments. Costs for some 
of the larger components, such as sequencing signifi- 
cant portions of human or nonhumati DNA, hinge on 
unit costs that are highly uncertain now and are rap- 
idly changing due to technical advances (e.g., the cost 
of sequencing a single base pair of DNA). These un- 
certdinties suggest that a 5-year budget plan is the best 
that can be produced, and projected costs for even 
the first 5 years might need to be substantially revised. 
The costs of human genome projects can be separated 
by components, although the boundaries between 
some of them are imprecise. The costs projected in 
this appendix are based on a process followed by OTA 
to generate estimates from internationally recognized 
experts. 

OTA Cost Estimates 

In order to better estimate potential costs of human 
genome projects, OTA held a workshop on August 7, 
1987. At that workshop, there was apparent consensus 
on rough estimated costs of several components and 
confusion or disagreement about many others. A fol- 
low-up letter was sent to workshop participants and 
over 150 experts from executive agencies, univt/si- 
ties, and corporations to confirm estimates made at 
the workshop and to expand them. Replies were re- 
ceived from over 70 personc . The revised cost estimates 
were externally reviewed by over 100 individuals and 



institutions in a draft report circulated in November 
1987, and some minor revisions are based on com- 
ments received during this review. The resulting cost 
projections attempt to include most of the direct costs 
of research. They do not include indirect costs of 
university administration (although they do include 
administration in Federal agencies). 

In some cases, it may prove possible to attract fund- 
ing from the private sector— foundations, medical re- 
search institutes, or corporations. If so. Federal spend- 
ing could be correspondingly reduced, lu many cases, 
however, the Federal Government will eventually pay 
the full costs. If a company developed mapping and 
sequencing information or new instruments, for ex- 
ample, the first— and for a long time the predominant- 
users would remain researchers funded to do biomedi- 
cal research by the Federal Government. This would 
be the case for most technologies developed as part 
of human genome projects (use by researchers being 
the primary goal of the enterprise). A company's in- 
vestment would thus be charged back to the go\ ern- 
ment by charging for use of information or purchase 
of instruments by the research community. In some 
cases there may be a market for products outside the 
biomedical research community. If so, the private sec- 
tor funds could indeed displace government funds. 
Funding from research fc»und?tions, medical institutes, 
and other philanthropies ;YOuld also, as a rule, subsii- 
tute directly for government costs. 

There was strong consensus about the importance 
and feasibility of improving the research infrastruc- 
ture (databases and repositories) and generating 
genetic and physical maps of human and nonhuman 
chromosomes; there was substantial uncertainty about 
sequencing strategies and their associated costs. It is 
agreed that the need for new technologies is para- 
mount, but there is disagreem':iit about huw much it 
would cost to develop them or how such efforts should 
be organized. 

Discussion in the following sections reviews costs 
by component. 

Computers and Computational Methods 

Cost estimates for the necessary personnel, research, 
and equipment are $12 million per year for the early 
} ;ars, increasing to 15 percent of the overall budget 
as it exceeds $80 million annually This would be in 
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addition to continued support of existing databases and 
computer facilities. Spending should be relatively flat 
over time, because hardware will have to be purchased 
in early years and research will take an increasing 
proportion of the budget in later years. Hardware will 
have to be upgraded, however, so cost estimates are 
necessarily uncertain for future years. While It is log- 
ical to link computational needs to human genome 
projects, funding devoted to storage of genetic data 
and sophisticated analysis of DN A will prove impor> 
tant in molecular biology even if maps are not com- 
pleted and other human genome projects are not 
funded. 

Genetic Maps 

Genetic mapping has been conducted for se /t-ral 
years, and a rough map of human chromosomes al- 
ready exists. Discussion at the OTA cost workshop cen- 
tered on a map with two to four times the resolution 
of current maps. Subsequent letters and discussions 
have centered on a further increa e in resolution, 
preferably such that a gene being studied would be 
separated from its closest DNA markers (on average) 
in only 1 of 100 family members. (Geneticists call this 
a 1-centimorgan map.) Estimates based on existing pro- 
cedures yield annual costs of $6 million per year for 
5 years. Since this is an existing technology and there 
are already facilities to do the mapping, a startup 
period is not needed. Funds saved from new methods 
could be devoted to automating the processes so fur- 
ther refinements of maps in humans and other organ- 
isms would be easier to construct in the future. 

The two principal groups constructing hu.iian 
genetic marker maps to date have not been federally 
supported. One has used private corporate funds, and 
the other has been funded by the Howard Hughes Med- 
ical Institute GiHMr' HHMI -sponsored work is a nearly 
direct substitute for goverment funding. Future work 
would be of greater magnitude, however, and may re- 
quire Federal investment. In the case of work sup- 
ported by Collaborative Research (the largest corporate 
group) and other companies, the Federal Government 
will probably pay for access to the probes either as 
a lump sum (to obtain access for all federally funded 
researchers) or indirectly (as federally funded re- 
searchers pay for access to individual DNA markers 
or mapping services). 

Physical Maps 

Physical maps wou'd be quite useful for future re- 
search. Ordered clone sets linked to them would be 
even more useful. Pilot projects on selected human 
chromosomes and on many lower organisms are in 



progress, and a useful set of ordered clones from all 
the human chromosomes may be feasible in the next 
5 to 10 years. 

Projections based on existing technology yield costs 
of $60 millic i for a usefully complete set of ordered 
cosmid clones over 5 years. New technologies may per- 
mit the creation of ordered sets of much larger DNA 
fragments (using yeast artificial chromosomes, YACs), 
and these would be extremely useful also. Costs of con- 
structing ordered libraries composed of both cosmid 
and YAC clones are estimated at $70 million. 

There are substantial uncertainties regarding both 
types of clones. Physical mapping of human chromo- 
somes using cosmid clones has only begun in the last 
year, and therefore the rate and completeness of such 
mapping are highly uncertain. Mapping with yeast arti- 
ficial chromosomes is much newer, although promis- 
ing. The main uncertainty regarding YACs is not cost, 
but feasibility: If such mapping is possible, it would 
be substantially less expensive than mapping using cos- 
mids (although cosmid maps miglit be needed for many 
research applications). 

Ordered clone librarier are difficult to complete. 
Progress is rapid at first, but il is unlikely that a chro- 
mosomal region can be spanned without gaps between 
groups of continuous clones. Maps complete enough 
to be useful can be expected from several years' ef- 
fort, but if truly complete map.> are necessary, then 
efforts must be continued, perhaps at funding levels 
equal to those for initial construction. Half or more 
of the total effort may be required for the last 10 per- 
cent of the -naps. Clone libraries with gaps are quite 
useful, however, because a chromosomal region of in- ' 
terest is likely to be represented even in incomplete 
libraries. 

Cost estimates start at $10 million for the first year 
(building on current Federal expenditures), rise to $20 
million in $5 million increments over 2 years, and then 
drop to $10 million (with the proviso that continued 
higher funding may be necessary if complete maps are 
deemed essential). 

Projects To Link Genetic and 
Physical Maps 

Identifying the parts of DNA that carry the instruc- 
tions for making protein and integrating them into 
genetic and physical maps would be very useful. Which 
stretches of DNA are actually used to produce protein 
varies with the tissue (many genes are expressed differ- 
ently in different tissues or stages of development). 
The likely process would be to make DNA copies 
(cDNA) of the RNA that is translated into protein from 
a variety of tissues (both healthy and diseased) and at 
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various stages of development. Locating cDNAs on 
physica «jid genetic maps would result in cDNA maps. 

Such maps could be used to pick out protein-coding 
regions along stretches of DNA of unknown function. 
This would make the physical map much more use* 
ful; by highlighting regions of particular interest; and 
%vould provide a missing step in the search for genes 
whose approximate location had been determined by 
genetic linkage maps. DNA sequencing might also be* 
gin by using cDNAs to select regions likely to be of 
interest (because they are known to produce protein). 
Maps of cDNA would give clues to a gene's function 
if the pattern of expre ion related to a known bk)* 
fegk^al process. QHnparison of cDNAs from human and 
other organisms can give clues to function by relating 
expression to degree of evolutionary relatedness. If 
a genetic disease is located in a certain chromosomal 
region and cDNA maps show that one DNA segment 
from that region is transcribed only in the tissue af* 
fected by the genetic disease, then the gene cat- 
responding to the cDNA is a good candidate for the 
gene causing the disease. Maps of cDNAs have been 
suggested by several groups [2;3;11;12;13;15]. 

The first step in constructing cDNA maps would be 
to coUect and organize existing sets of cDNA clones. 
New sets of cDNA clones couU be made from missing 
tissues; disease states ; developmental stages, or organ* 
isms. The various cDNAs could then be located on 
genetic linkage maps and physical maps. The cost of 
this process is highly uncertain, in part because the 
number of genes in human and many other organisms 
is not known. Those specifically asked about this com* 
ponent estimated that its costs would likely range from 
$2 million to $5 million per year, depending on how 
much work could be done by mei'ely cataloging exist- 
ing cDNA clone sets; how many new sets would have 
to be constructed; how many organisms, tissues, de* 



velopmcntal stages, and disease states would be used 
as sources; and the extent ot genetic and physical maps. 
The costs of cDNA mapping would increase with the 
increasing detail of genetic and physical mcips. OTA 
estimates start from a base of $2 million, iricreasing 
annually by $1 million increruents. 

Resource Material Repositories 

Estimated costs of storing the clone sets linked to 
physical maps, ceU lines for genetic research, and the 
various DNA analytical materials for genetic mapping 
originally ran to over $250 million. The largest com* 
ponent, dwarfing all others, was the cost of storing 
the DNA clones linked to physical maps. Such storage 
costs are virtually prohibitive, and these estimates were 
dropped. Subsequent discussions with experts on stor* 
age of materials for molecular biokigy, specifically with 
persons at the American Type Culture CoUection, 
yielded storage estimates an order of magnitude lower 
The estimates summarized in table B-1 are for collec* 
tion and storage of clone sets. Costs of diss^ mination 
would be borne by users through user fees. Costs of 
coUecting and storing mutant cell lines and DNA ana- 
lytical materials (such as probes) have not been in* 
clud(Kl. 

Sequencing 

There is little consensus on how much DNA sequenc* 
ing should be done as part of genome projects, par* 
ticularly whether a complete human reference se* 
quence should be an objective. There is consensuSf 
howevei> that sequendng technolc^ is crucial and 
ripe for innovation. Cost projections should become 
easier in 2 to 3 years, as the first automated DNA se- 
quencing machines are improved; massive sequenc- 
ing would not begin in most schemes for several years, 



Table B-I.-^OTA Budget Projections for Genome Projects (mllllont of doHart, adjusted to 1968) 

Component Year 1 Year 2 Year 3 Year 4 Year 5 

Computers and analysis 12 12 17 24 29 

Genetic maps 6 6 6 6 6 

Physical maps 10 15 20 20 10* 

cDNAmaps 2 3 4 5 6 

Resource material repositories 1 2 3 4 5 

Sequencing - — IS*' 20^ 45^ 

Quality control — 1 2 3 4 

Technology development 10^ 20^ 50** 75^ 100° 

Training 4 6 8 10 12 

Amlnlstratlon 2 3 6 9 11 

T otal 47 68 131 186 228 

Hfkf rtqulra upward adluttm«nt for m«p cloturt. 

■^Subl^ct to contldtrabit uncertainty, (toptnding on tachnical improvwnontt, strattgy, and unit costa 
SOURCE: (Mg% of Tachnolooy Ataatsmant, 1966 
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except as part of pilot projects and for HNA regions 
known to be of special interest [6]. The debate about 
sequencing involves disagreement about the costs of 
sequencing per base pair, the amount of redundancy 
necessary to make a sequence useful, the expected 
pace of technological improvements, which laboratory 
preparation steps are included, and how much DNA 
would be sequenced as part of human genome projects 
6*ather than through traditional funding mechanisms). 
Estimates of the cost of sequencing vary widely, rang- 
ing from several pennies to several dollars per DNA 
base [6,16], but there is some agreement that costs 
would drop to $0.20 to $0.30 per base pair by the end 
of 1988, based on existing technologies. Some of the 
discrepancy in the estimates comes from including 
different components. The costs of special ckming pro- 
cedures, preparing DNA, use of reagents, technician 
time, and capital costs of instrumentation should all 
be included in cost estimates. 

Judgments about which technologies to use and how 
much sequencing should be performed are best made 
each year by an advisory committee with access to 
technical experts. Such judgments would presumably 
be based on costs, the availability of material to se- 
quence, and consensus on which regions to sequence 
first. For OTA projections, a few assumptions have 
been made. For the first 2 years' budget, sequencing 
would be covered as technology development^>er- 
fomed on lower organisms or human chromosomal 
regions of known interest—for possible sequencing on 
a larger scale. For years 3 to 5, it would be based on 
sequencing one small chromosome per year at $0.20 
per base pair ($30 million per year, based on threefold 
redundancy and 50 megabases per year), permitting 
a phase4n period for m^olementation of the technol- 
ogies. This estimate is for purposes of budgeting only, 
however, and could prove v\dldly high or low. If the 
technologies for cloning, preparing DNA for sequenc- 
ing, and finally determining a DNA sequence becon e 
significantly cheaper, as some experts predicted at the 
OTA workshop, the amount of DNA sequenced at that 
cost could be increased. If costs remain high, only a 
limited amount of DNA could be sequenced, accord- 
ing to priorities set by the oversight board. After the 
fifth year, the budget could go up or down in propor- 
tion to need. 

Quality Control and Reference 
Standards 

The large amounts of map and sequence informa* 
tion and new materials created by human genome 
projects will be useful only if the information is ac- 
curate and resource materials are cataloged reliably. 



If there are many different groups involved in the ef- 
forts, problems of quality control could impede use- 
ful applications. The scope and magnitude of this prob- 
lem will become clear only when the technologies are 
defined and the results of mapping and sequencing 
effcjrts begin to accumulate. Special budget allocations 
for comparing results from different groups or to 
establish measurement standards may become neces- 
sary. Budget needs for quality control will be nil in the 
first year and will grow in early years until they con- 
stitute 5 percent of the overall budget. For initial esti- 
mates, it is projected to grow by $1 million per year 
from a base of zero. 

Technology Development 

Investments in methods and instruments associated 
with genome projects are likely to lead quickly to com- 
mercial applications. The objective for technology de- 
velopment is open-ended, however, and it could be 
either the largest component or a relatively smaU frac- 
tion of genome projects. Responses to OTA letters and 
drafts showed no consensus on the proper budget. 
Many scientists familiar with industrial development 
encouraged higher figures, while academic molecu* 
lar bk)k]glsts set knver ones. A maximum figure of $500 
million to be spent over 5 years was mentioned at the 
OTA workshop, in line with recommendations of a 
committee established by the Department of Energy 
(DO£). There was some support for the alternative of 
devoting 25 percent of the total budget to technology 
development [6]. Minimvmi estimates were for a steady 
state of $20 million to $30 million. Several individuals 
noted the importance of developirig technologies early 
on, while recognizF g the need to Keep early budgets 
realistically low because a new research program 
would require the accumulation of trained personnel 
and pilot work to provide a foundation for later work. 

The approach used in OTA estimates is to increase 
funding from $10 million the first year to a stable fig- 
ure of $100 million by yearly increments. Funding for 
biological instrumentation centers under the National 
Science Foundation might account for part of this, and 
methods or instruments of great interest to industry 
might lead to some cost sharing with private firms. 
If so. Federal funding could be reduced accordingly. 
Technology development funding, like sequencing, is 
among the most flexible of the proposed projects and 
could be adjusted by the oversight board and the con- 
gressional appropriations process. 

Training of Personnel 

Training of investigators and sdentific exchange 
among participants are crucial and would include grad- 



ERIC 



184 



uate and postgraduate feUowships, scientific work- 
shops, and national scientific meetings. Some persons 
urge that fellowship funds be targeted to shortage 
areas, but others believe that targeted programs are 
less effective than untargeted ones for the best peo- 
pie in any relevant discipline. If training were targeted, 
it might include devebpment of dual expertise in com- 
puters and molecular biology, organic chemistry and 
molecular biology, engineering and molecular biology, 
and clinical medicine and informatics molecular bi- 
ology. Training would also be needed for technicians, 
and for sabbaticals for scientists interested in shifting 
from their fields to genome projects. Workshops 
among participating groups and national symposiums 
to communicate re stilts would permit rapid dissemi- 
nation of new methods and insights. Exchange pro- 
grams among industrial, national laboratory, and aca- 
demic scientists would promote technology transfer. 
Training and personnel costs are estimated to merit 
10 percent of each annual budget. For initial projec- 
tions, funding might start at $4 million and increase 
yearly by $2 million. 

Administrative Costs 

Participants in the August 1 987 OTA workshop esti- 
mated thr' 1 to 3 percent of each year's budget would 
be neede >r administra tive overhead . That estimate 
was subsequently increased to 5 percent in r sp«^nse 
to letters and after analyzing administrative costs at 
Federal research agencies. Administrative costs include 
operation of a national advisory board; oversight of 
databases, repositories, networks, and other services; 
setting instrumentation standards for cbning, map- 
ping, and sequancing technologies; administration of 
grants and contracts; and other purposes. Sorne addi- 
tional features would be unique to genome projects, 
for example, analysis of likely social impacts and ethi- 
cal dilemmas created or intensified by genome projects. 
The need for such analysis has been explicitly noted 
in hearings and has been highlighted by research 
agency administrators and congressional staff. It could 
be obtained through grants to bioethicists, lawyers, 
economists, and social scientists for publications or 
workshops on various topics. 

Summary 

The costs of the components of human genome proj- 
ects are projected ir table B-1. These would start from 
a base of $47 million in fiscal year 1989 (if 1989 were 
the first year) and increase to $228 million in fiscal 
year 1993. It is not useful to project budgets beyond 
then, because technobgical development is so uncer- 
tain. The projected figures do not attempt to assign 



functions to parti'^ular agencies, merely to state over- 
all direct research costs. Future budgets will need to 
be revised in light of actual appropriations. 

History of Earlier Estimates 

Perhaps the earliest evidence of a human genome 
project is found in a letter from Robert L. Sinsheimer, 
then Chancellor of the University of California, Santa 
Cruz (UCSC), to University of California President 
David Pierpont Garc ner, on November 19, 1984. A po- 
tential benefactor had withdrawn support from a 
project, and Sinsheimer took the opportunity to pro- 
vide a counterproposal that might interest the benefac- 
tor. In doing so, Sinsheimer suggested that a human 
genome institute be founded at UCSC, with startup 
costs of $25 million and an annual operating budget 
of $5 million. This was, in effect, the first cost esti- 
mate for a human genome project. 

The let er from Sinsheimer to Gardner referred to 
an enclosure, later to be used as a basis for discussion 
at the May 1985 Santa Cr.iz Human Genome Work- 
shop, in which the institute was formally proposed [7]. 
The proposal assumes existing mapping technologies 
and a continued rate of development of DNA sequenc- 
ing speed equal to *he exponential increase of the past 
decade. The proposal then concludes that "within a 
few years, the human genome could be reduced to 
an ordered set of cloned fragments" and that "50 tech- 
nicians could approach completion of the [sequ. ing] 
project in 10-20 years." The proposal estimates the 
yeaWy support of each technician at $100,000, yield- 
ing an annual budget of $5 million and a total project 
cost of about $100 million. The proposal also calls for 
$25 million for startup facilities, and it distributes the 
operating money among the mapping and sequencing 
project itself (75 percent), developing techniqres (10 
percent), application to basic biology and medicine (10 
percent), and education and training of students and 
other personnel (5 percent). 

The Santa Cruz workshop displays similar optimism 
about the mapping aspect of a genome project, sug- 
gesting that a physical map could be completed by a 
20-person group in 3 to 5 years [14]. The workshop 
also included discussion of a restriction fragment 
length polymorphism (RFLP), or genetic, map. This map 
could be achieved in "a few years" at a resolution finer 
than 50 centimorgans. Based on then current technol- 
ogies, sequencing the 3 billion base pairs of the hu- 
man genome was taken as "not feasible." The work- 
shop went beyond the initial proposal and discussed 
details about the computer requirements for a project. 
There was, however, no explicit cost estimate for these 
details. 
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The next round of cost estimates ca^ne out of DOE's 
workshop in Sante Fe in March 1986. Appended to the 
workshop notes and the correspondence the workshop 
generated between the participants and DOE's Mark 
Bitensky was a cost estimate by Christian Burks of the 
Los Alamos National Laboratory. Burks calculates the 
person-years required for various as]3ects of the 
projects which^ for a physical mapping and sequenc- 
ing endeavor, including computer and administrative 
costs and assuming some sequencing advances, totals 
3;505 person-years [5]. Allowing for hardware and 
overruns of his estimate, Burks concludes a genome 
project would cost between $0.5 and $2.5 billion. 

The next major meeting, at Cold Spring Harbor, fo- 
cused primarily on sequencing. The only estimate to 
issue from the discussion was the oft-quoted 30,000 
person-years required to sequence the human genome 
one time through [8]. This estimate— translated into 
$3 billion by either $1 per base pair or $100,000 per 
j3€rson-year— was based solely on existing technology 
and was therefore obsolete within days of the confer- 
ence, when the automated sequencer at the Califor- 
nia Institute of Technology was announced [IS] 

By the middle of 1986, the Caltech sequenator had 
made it clear that advances in sequencing technology 
would drive costs down. HHMI's Informational Forum 
at the National institutes of Health in Washington, D.C , 
continued to quote the 30,000 person-year estimate, 
but it also cautiously offered an estimate of 300 person- 
years, assuming a two-order-of -magnitude increase 
from automation [17]. The HHMI forum likewise gave 
a dual estimate for the physical map (200 person-years, 
or 30 to 40 with automation advances) [4], and for com- 
puter storage of sequence information ($0.30 per base 
pair, $0.03 with advances) [1]. 

Nine months later, DOE brought out its own cost esti- 
mates, presented as yearly budget for a genome 
project. In the Health and Environmental Research 
Advisory Committee report, the subcommittee scien- 
tists estimate that sequencing, with redundancy for 
accuracy, would cost $60 million, assuming advances 
in automation [19]. Sufficient automation should be 
available 5 years hence [10]. The remainder of the bud- 
get IS not described in detail, but it does specify that 
$500 million will go to various aspects of technologi- 
cal development, including mapping and informatics, 
assuming $100,000 per person-year cJ research [9]. The 
total for the DOE-proposed projects comes to $1.02 
billion. Table B-2 presents a summary of estimates. 

The National Research Council of the National Acad- 
emy of Sciences established the Committee on Map- 
ping and Sequencing the Human Genome, whose re- 
port was released in February 1988 ll3]. That report 
represents the views of an exceptionally distinguished 
panel of experts from diverse scientific backgrounds. 
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The panel members began their deliberations with 
widely differing knowledge of the state of gene map- 
ping and divergent opinions about the merit of special 
research efforts. While writing the report, the panel 
reached a consensus that a special effort '.vas merited 
and recommended additional funding of $200 million 
per year. This level would be reached over the initial 
3 years. During the first few years, the budget would 
be roughly divided into $120 million for research in 
10 or so multidisciplinary centers and numerous small 
research groups. Construction and materials would 
cost $5o million per year, and ^^25 million would oper- 
ate repositories, databases, tiaining, and administra- 
tive functions. In later years, the budget would increase 
for dedica ed production of map and sequence data. 
This $200 million annual budget would continue until 
at least the year 2000. 
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Table B-2.— Comparison of Genome Cost Estimates (mllllont of dollars (M) or parton-yaars (py)) 



Source RFLP map Physical map Sequr nee Computing Repository 

UCSC* position paper 

11/1W64 "a few years" 500-1,000 py 

UCSC workshop 

5/24^86 "a few years" 60-100 py "not feasible" 

Sante Fe workshop (<50 cM) 

3/86 55 py 3,000 py 300 py + 

hardware 

Cdd Sprin g Harbor 
Symposium 

5/28^2/86 30.01/C 

HHMI/NIH 

jpformatlonal lOrum 

7/23/86 200pyor 30,000 py or $.30/bp or 

30^ py^ 300 py^ $.03/bp° 

DOE/HERAC 

4/87 $60 M or 

6,000 py* 

NRC 



Other 



Xotal 



OTA^ 
a/C7'1/86 



$30 M (IcM) 



$70 M 

YAC and 
cosmid 



$60 M 

not complete 



$12 M mln 

15% of t:tal 



$15 M 



150 py administration 



$500 M technology 

$ 60 M/yr 10 centers* 
$ 60 M/yr grants and 

technology* 

development for 

small groups 
$ 55 M/yr^ facility 

construction (early 

years, decreasing 

later) 
$ 25 M/yr 

administration, 

quality control. 

advisory committee 

functions 
$200~M/yr 

$10 M quality control 
$20 M linking projects 
5% administration 
$255 M technology 

development 
$40 M training 



$25 K 

facilities 



$500-2.500 
M" 



$1,020 M 

$3,000 M 
(over 15 >rs) 



$660 M 

(first 5 yrs. 
only) 



*AMumts $100,000 ptr p«rton-y««r 

^Not conMntut flgurtt but individual opinions 

^Money for facllitlM in tarty y«art would go to mapping and ••qutncing In i«t«r yaart 

^Ettlmctta for flmt 5 yaart only. Oott not attumt complttt rtft.-^nct ttqutnct For dttalls. ttt ttxt 

Abbftviationa: DOP'HER/X— Htaith and Environmtntal Advisory Ccmmittat. Dtpartmtnt of Entrgy. HHMI— Howard Hughta Mtdlcet Inatitutt. NIH— NttiontI Inatl- 
tutaa of Haalth; NftC—National RMtarch Council; OTA— Offlct of rtenr>oh>oy Aaataamtnt, U.S. Congraaa. UCSC— Unlvaraity of CatifornIa at Santa Cruz 
SOURCES UCSC: ptftonal communlcatkyia from Robtrt Sinahalmtr. Chanctlior, UCSC. January 1967 and Auguat 1967; Santa Cruz Human Qtnomt Wort<altop (8CHQW). 
"Nottt and Conctutlona," Junt 4. 1966; and Bob Edgar, hurf Nolitr, tnd Bob Ludwig. "Human Otnomt Inatltuta. A Poaition Paptr." ancloaurt In ptraonal 
eorrtapofldtnct from Robtrt L Sinthtlmtr to David Pijrpont Qardntr (Nov. i9. 1964) and diatrtbutad for Santa Cruz Human Oanoma Workahop (May 24 
to 29, 1966); itirtt Ft Wtffcahop: Chrtatian Burka. "Tht Co9t of Stqutncing tht Complttt Human Otnomt." App. VI' Qtnomt Stqutncing Workahop. Santa 
Fa, NM, Mar. 3 and 4. 1966; HERAC: U^. Dtpartmtnt of Hntrgy (DOE), R§port on r/M Human Q0nom0 Inttlsilv9, Subcommltttt on Human Otnomt of tr>t 
HtaMh and Envlromntntal Rataarch Advlaory Commlttat, Aprti 1967; CoM Spring Harbor Sympoakim: Walt^ Ollbtrl, commtnta at Cold Spring Harbor Labora- 
tortta Sympoalum, "Moltcuiar Biology of Homo BMpl^nB, May 26 to Junt 4. 1966; HHMVNIH aympotkim. Rtmarka of Gtorgt Btii, Sydnty Brtnntr. and David 
Smith, at Howard Hucn*9 (Radical Inatltuta Informational Forum on tht Human QafKMna. Juiy 23. 1966. NRC: National Raaaarch Council. National Acadtmy 
of Scltnct. Commlttat or* Map))lng and Stqutncing tht Human Qanomt. Mapping MfHi $9quonctng tho Human Otnomt (Waahington. DC National Acadamy 
Praaa. Fab. 1966); OTA: "Coata of Human Qtnon>t Projtcta" Workahop, Aug. 7. 1967. with aubatqutnt aummary ittttr and rtvltw. Octobtr 1967 and axtamal 
ravltw of coat projactiona Oectmbtr 1967 
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Georgetown University 



ERLC 



187 



188 
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Appendix D 

Databases; Repositories; and 

Informatics 



Among the most useful products of genome* projects 
will be information and materials— information aboiit 
genes and th^ir locations and sequences, and biologi- 
cal materials such as DNA fragments from chromo- 
somes of known pedigree, ordered cosmids, and 
clones. Proper management of data and materials is 
essential to increase the efficiency and productivity 
of research and to reduce duplication of efforts so that 
genome projects can succeed in meeting the needs of 
medical scientists and molecular biologists in this cen- 
tury and the next. 

Existing databases and repositories that gatlier, main- 
tain, analyze, and distribute data and materials are 
already struggling to keep up with the exponential 
growth of molecular biology. Present capabilities will 
have to expand greatly to handle the increase of infor- 
mation resulting from a targeted set of genome proj- 
ects. While it is lo^cal to link computational needs 
to genome projects, however, funding devoted to 
storage of genetic data and materials and to sophis- 
tfcated analysis of DNA will prove important in 
molecular biology even if a major mapping end se- 
quencing initiative is not undertaken. Because the 
essential databases, repositories, and linking computer 
networks provide goods and services for the entire 
research community, the Federal Government has a 
long-standing tradition of supporting them and is in 
a unique position to further enhance the resources. 

This appendix describes some existing databases and 
repositories and outlines present and future database 
needs relevant for human genome projects specifically 
and molecular biology in general. 

Databases 

Various databases exist that serve the needs of re- 
searchers in genome mapping and sequencing (see ta- 
ble D-1). One set of databases gathers, stores, and dis- 
tributes information directly related to genetic maps 
and physical maps. Some databases specialize in map 
and sequence information from one specific genome— 
for example, there are databases exclusively devoted 
to the mouse, £. coli bacteria, drosophiia, and nema- 
tode genomes— while others carry particular kinds of 
information from all the relevant genomes. Other data- 
bases gather data on the sequences and structures of 
proteins and amino acids that are not direct results 
of mapping and sequencing research but are neces- 



sary for addressing basic research problems under- 
pinning genome research. The data from the differ- 
ent types of maps and from different species have 
important micrcnnnections, so it is essential that 
the information be linked lui ccmparative studies. 

Genetic Maps 

Gei.etic maps can be generated in several ways (see 
ch. 2). Pedigree analysis of linked traits yields a map 
in which traits can be ordered sequentially and with 
a rough estimate of the distance between them. RFLPs 
and other DNA probes can help link the traits with 
specific genes or regions of DNA to produce more re- 
fined maps. Maps of the functional regions within in- 
dividual genes aid in the search for underlying causes 
of genetic diseases and for the mechanisms by which 
genes control development and function. Several 
different databases serve the different information 
needs for specific kinds of maps. 

On-Line Mendelian Inheritance in Man (OMIM).— 
An atlas of human traits that are known to be inher- 
ited^xpressed genes— has been compiled into a refer- 
ence work known as Mendelian Inheritance in Man, 
which has been published in seven editions. The list- 
ing has been edited by Victor McKusick of The Johns 
Hopkins University since 1966. As of March 1, 1988, 
4,336 traits had been identified as genetically based, 
including over 2,000 diseases. 

Since 1986, the Howard Hughes Medical Institute 
(HHMI) has supported computerization of the list, and 
it is now accessible for on-line searches free of charge 
(4). It is cross-referenced in the Human Gene Mapping 
Library so that information on expressed genes can 
be linked to map data. 

Human Gene Mapping Library (HGMU.-A180 called 
the New Haven Database, HGML consists of five linked 
databases— one each for map information, relevant 
literature, RFLP maps, DNA probes, and contacts (re- 
searchers with information on data or materials). In 
addition, the map database is linked to OMIM. All of 
the databases are cross-referenced, so that data about 
a gene or probe of interest can be drawn from all five 
during the same search (10). 

DNA Nucleotide Sequences 

Databases containing raw DNA sequences, informa- 
tion about the origin of the DNA segment sequenced 
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Table D-1.— Some Existing U.S. Databases and Repositories 





Location 


Funding source 


Annual budget* 


Nucleotide sequence data: 








GenBank* 


Los Alamos National Laboratory, 
Intelllgenetlcs Corp., CA 


nin, usjCf nor, UoUA 


$3,500,000 


Qanatic map data: 








On-Line Mendelian 


Johns Hopkins University 


Johns 'lopkins University, 


$ 550,000^ 


inheritance in Man (OMIM) 


Baltimore, MD 


HHMI, NLM 




Human Gene Mapping Library 


New Haven, CT 


HHMI 


$ 500,000 


(HGML) 






Piotoin and asnino scki aaquence and structure data: 






Protein Identification 


National Biomedical Research 


NIH^ 


$ 500,000 


Resource (PtR) 


Foundation 
Washington, DC 




Protein Data Banic (PDB) 


Brookhaven National l-aboratory 
Upton, NY 


NSF, NtH,« DOE 


$ 260,000 


Repositories: 








American Type Culture 


Rockvllte, MD 


NIH' 


$ 300,000« 


Collection (ATCCVHuman 






DNA Probe and 








Chromosome Library 








Human Genetic Mutant Cell 


Coriell Institute for Medical Research 


NIHO 


$ 750,000 


Repository 


Camden, NJ 





||Budg«t fIgurM are approximate. Several of the (iatabaeet have multiyear contracts; «nount listed Is the averaoe yearly allotment. 

°NIH eponsort of QenBank*. past and present, Include the National Institute of General Medical Sciences (NIGMS), the Division of Research Resources (ORR), the 
Natk}nal Institute for Alleroy and Infectious Diseases, the National Cancer Institute, the National Library of Medicine, the National Eye institute, and the National 
Institute of Diabetes and Diseases of the Kidney. The NKjMS administers the contract and coordinates the funding. 

^The Johns Hopkins Unhrerslty contribution to OMIM Is difficult to measure, because It includes many indirect factors (staff support, space, publication costs, etc.). 
HHMI contributes $318,000 and the NLM $100,000 annually. 

^nita NIH sponsor is DRR. 

*NIH sponsors are NIOMS and DRR 

^NIH sponsors are the . ' of Child Health and Human Development (NICHD) and DRR; DOE has contributed soma funds through DRR. 

OThe NIH sponsor Is NIGk. 

SOURCE: Office of Technology Aw>wssment, 1968 



(which gene, which organism), and various annotations 
that summarize information about important features 
in the sequence (sites cut by DNA -cutting enzymes, reg- 
ulatory sequences, protein-coding regions) will be 
directly affected by genome projects that emphasize 
sequencing. The major databases for nucleotide se- 
quences are GenBank* and its European counterpart, 
EMBL (8). Each carries sequence data and related in- 
formation for the human genome as well as bacterial, 
yeast, fruit fly, mouse, and other genomes. Since 1982, 
GenBank® and EMBL have split the task of data collec- 
tion, with each database monitoring specific journals 
in molecular biology to locate and enter sequence data, 
and they cooperate closely in sharing and distributing 
it. They have recently been joined by the DNA Data 
Bank of Japan PDBJ), which is in charge of monitor- 
ing Asian journals and contributing to the reciprocal 
exchanges. (DDBJ served primarily an an access node 
to GenBank® and EMBL starting in 1984, but did not 
start gathering its own data until 1987.) 

GenBank^.-GenBank® originated at the DOE's Los 
Alamos National Laboratory in 1979 and started to re- 
ceive funding from the NIH in 1982. It is the major 
U.S. database for niltleic acid sequence information 



from humans and other organisms (3). GenBank* is 
presently administered and receives a major portion 
of its funds from the National Institute of General Med- 
ical Sciences (NIGMS) of NIH. Data are entered and up- 
dated by curators at Los Alamos and are distributed 
by InteUigenetics Corp. (Mountain View, CA). 

The amount of data contained in GenBank® has 
grown exponentiaUy since its inception. In addition, 
the number of users has increased from a small set 
of one hundred or so who accessed it when the first 
NIH contract started to lens of thousands of scientists 
v/ho now access either directly or through commer- 
cial distributors. GenBank^'s new 5-year contract, 
which took effect in October 1987, significantly in- 
creases funding to meet the growing demand. 

Protein and Amino Acid 
Sequences and Structures 

Databases that gather information on protein and 
amino acid structure and function are crucial for the 
application of genomicr research to clinical and phar- 
maceutical problems, as well as for advancing the un- 
derstanding of basic problems in biology— how genes 
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function, how they code for proteins and enzymes, 
and how their protein products are structured and 
function (see ch. 2). The effects of map and sequence 
data on these databases will depend on the strategy 
followed for genome projects. For example, a con- 
certed nucleotide sequencing effort would affect re- 
search on protein and amino acid structure more 
slowly than increased funding to researchers study- 
ing specific genes and their gene products— generally 
proteins (6). 

Protein identificadon Resource (PIID.-PIR is "a re- 
source designed to aid the research community in the 
identification and interpretation of protein sequence 
information'' (14). It contains sequence data for pro- 
teins and amino acids, with annotations that indicate 
known functional regions. PIR is run by the nonprofit 
National Biomedical Research Foundation and receives 
most of its funding from NIH's Division of Research 
Resources. Modest user fees cover the distribution 
costs; academic users pay a flat fee, while commercial 
users are charged by the amount of computer time 
they use. PIR has recently started cooperating with 
the Japan International Protein Database (JIPID) and 
the new European database, Martinsreid Institute for 
Protein Sequence Data (MIPS), to establish an interna- 
tional data ^ Iwork for protein sequences. 

Protein Data Bank (PDB). -The Protein Data Bank 
was founded in 1971 as "an international computer- 
ized archive for stpjctural data on biological macro- 
molecules'' (1). It gathers information on the atomic 
coordinptes of the structure of nucleic acids, mes- 
senger RNA, amino acids, proteins, and carbohydrates 
that have been derived from crystallographic studies. 
Structural ir lormation is a vital link in the understand- 
ing of how proteins function, which eventually leads 
to knowledge of the mechanisms of genetic disease and 
suggests possible directions for rational drug design. 

PDB is based at DOER'S Brookhaven National Labora- 
tory and supported primarily by NSF, with additional 
funds from the National Institute of General Medical 
Sciences of NIH. Modest user fees help cover the costs 
of distribution. Use of the database has been growing 
rapidly and is predicted to continue growing in paral- 
lel with human genome projects. Linking PDB with 
databases that contain genetic map and sequence in- 
formation wiD enhance the long-term goals of human 
genome research (12). 

Present and Future Needs 

The many types of informatior that are produced 
in molecular biology necessitate the maintenance of 
a variety of specialized databases. At the same time, 
however, the information in different databases must 
often be combined in order to understand the full 
dimensions of any specific research problem. It is cru- 



cial for the scientific community to be able to ac^ 
cess information on a topic of interest from a vari- 
ety of databases that may handle different aspects 
of the problem. Thus databases must use standard- 
ized or easily translatable formats and they must be 
interconnected. The problem of format has been rec- 
ognized and is being addressed in scientific meetings, 
by database advisers, and by funding agencies. Sev- 
eral programs are underway to improve the linkages 
between databases. An experimental project at the Na- 
tional Library of Medicine, discussed below, will de- 
velop a system to link a variety of databases relevant 
to molecular biology. 

The speed with which data are entered into the data- 
bases has been a major concern. The exponential in- 
crease in data has not always been matched by in- 
creases in the support for databases and personnel to 
operate them, causing a lag time of several months 
or even years between the publication of data and their 
entry, in fully annotated form, into databases. If the 
lag time is excessive, the efficiencies of centralized data 
management and retrieval are lost. One solution that 
is being explored is the direct submission of data to 
the databases by the researchers as a requirement for 
publication in journals. At least one journal has already 
agreed to cooperate with GenBank® and EMBL in an 
attempt to speed acquisitions in this way (19). Another 
possibility is to encourage funding agencies to make 
the submission of data or materials to the appropriate 
databases a condition of receiving research grants. The 
automation of data entry will be necessary as the 
amount of data increases. Automated methods are al- 
ready under development; the o^pacity to enter data 
may be built into some automated sequencing ma- 
chines. 

The timely exchange of data is also affected by is- 
sues of intellectual property rights and technology 
transfer. Open and rapid exchange of information and 
materials speeds research and is particularly impor- 
tant when the data have medical or clinical implica- 
tions. If the data and materials become commercially 
valuable, however— and many researchers predict that 
they will— the values of open access and free exchange 
could clash with the desire to protect proprietary rights 
on potentially patentable data or materials. Because 
access to databases and repositories is international, 
there are concerns that U.S.-funded research could 
be commercialized by other countries. The problems 
are not intractable, however: There are several suc- 
cessful precedents of advance contracts that specify 
how data will be contributed to databases while pro- 
tecting property rights (4,21). (See also ch. 8.) 

A major problem faced by databases for the past dec- 
ade has been insufficient funding for handling the ex- 
ponential increase of data. Costs will continue to rise 
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as more map and sequence data are generated. The 
government agencies and other organizations that sup- 
port genome projects appear to recognize the impor- 
tance of continued funding for relevant databases. For 
example, the increased budget in the new GenBank® 
contract (for 1987 through 1991) indicates that fund- 
ing agencies are aware of the need to enhance data- 
base maintenance. An initiative v^thin the National Li- 
brary of Medicine to sti-engthen information resources 
for molecular biology and biotechnology (discussed be- 
low) should lend further support to databases needed 
for genome projects. The Howard Hughes Medical In- 
stitute has been particularly active in sup()orting data- 
base resources and networks to link them. It is essen- 
tial that Hnancial sup]3ort continue to keep pace with 
the growing body of data. 

Repositories 

Genome projects will generate biological materials 
as well as sequence and map data. Access to these ma- 
terials is a key element in making the map informa- 
tion useful. A scientist searching for a gene of unknovm 
location would want to have access to a panel of DNA 
markers that could give an approximate location, then 
a more closely spaced set of markers to locate it more 
precisely. Once the gene's location was established on 
the genetic map, the investigator would select DNA 
dones covering that region of the human chromosomes 
from a repository, thus obtaining the DNA encoding 
the gene. Each of these steps would require access to 
a set of cloned DNA fragments. Existing repositories 
are hardly sufficient, but how much must be invested 
in them will depend on conclusions on the value of 
centralized sources rather than housing materials in 
individual labs. 

Companies developing a new product derived from 
or related to a human gene would also wish to have 
access to such materials in many instances. Storage 
and handling of such DNA resources is thus a crucial 
funct'cn. The materials will be most wddely useful if 
they are stored at national collection and storage fa- 
cilities. DNA probes, vectors, and some other materi- 
als are best maintained at a facility such as the Amer- 
ican Type Culture Collection (ATCC). Others, such as 
cell lines derived from individuals and families with 
genetic diseases, are stored in the Human Genetic Mu- 
tant Cell Re]308itory in Camden, New Jersey. Other ma- 
terials that are unlikely to have substantial demand 
from a wide variety of investigators might be stored 
at the laboratories that generated them and distrib- 
uted on a more informal basis to those requesting them. 
Present methods and technologies for the amplifica- 
tion; characterization, storage, and distribution of ma« 
terials are expensive and time-consuming; the costs 
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of storage could become a major component of map- 
ping and sequencing projects. Newer and cheaper stor- 
age methods will have to be developed as production 
of DNA fragments increases. The development of auto- 
mated techniques for organizing, managing, and ac- 
cessing materials will be necessary; research on auto- 
mated repository management is already underway 
at ATCC and at DOE's Los Alamos National Labora- 
tory (11, 21). 

Even with the advent of automated repository man- 
agement techniques, however, the high cost of stor- 
ing and maintaining materials makes the selection of 
materials to collect particularly crucial. While it might 
be desirable to k^ep large collections of clones gener- 
ated in an attempt to develop libraries of overlapping 
clones or contigs (see ch. 2), the curators of reposi- 
tories and the scientists who use them will have to 
choose which materials are of utmost importance, and 
these decidons should be periodically reviewed (22,23). 

American Type Culture Collection 

The ATCC maintains a variety of different collections 
of animal, plant, and bacterial cell lines, hybridomas, 
phage, and recombinant DNA vectors, as well as an 
NlH-sponsored repository of human DNA probes and 
chromosome libraries (20). The collection of chromo- 
some libraries includes materials from DOE's National 
Gene Mapping Library (see ch. 5). The ATCC ampli- 
fies and stores samples and distributes them, along 
with pertinent information, to investigators for a nomi- 
nal fee. Investigators must agree not to use the mate- 
rials for commercial purposes nor to sell them. 

The repository maintains a database of information 
on the source and characteristics of the material in 
its collection. Its advisory committee has recommended 
that the database be included in a mapping database 
such as HGML. 

Human Genetic Mutant Ceii Repository 

Sponsored by the National Institute of General Med- 
ical Sciences of NIH, the Human Genetic Mutant Cell 
Repository was founded in 1972 to maintain a collec- 
tion of well-characterized human cell cultures (2,17). 
The cultures are available to investigators worldwide 
at a nominal fee. The repository contains over 4,000 
individual cultures, which represent more than 400 
genetic diseases and 700 to 800 chromosomal aberra- 
tions (7). The curators of the collection have increas- 
ingly sought to include material from multigenerational 
family groups for linkage analysis; the repository now 
maintains cell lines from the Venezuelan Huntington's 
pedigree (see box 7-A) and others such as cystic fibro- 
sis families, families with fragile X-linked mental 
retardation, and so on. 



Data Analysis^ Informatics^ and 
Computer Resources 

Development of analysis methods to search for and 
compare sequence information, to predict sequences 
that code for proteins and the structures of those pro- 
teins, and to aid in other aspects of the ana'iysis of data 
from genome projects will eventually need to utilize 
parallel processing techniques and the capacity of su- 
percomputers. Most researchers agree that the hard- 
ware to tackle the complex problems of sequence anal- 
ysis and comparison already exists but that satisfactory 
software must be developed. The DOE, the NIH, the 
NLM, and the NSF support various programs and 
grants for the development of software to represent 
and analyze data and for the development of computer 
resources such as supercomputing centers and com 
puter networks. Several of these resources are de- 
scribed below. Numerous private firms are develop- 
ing or marketing computer programs that search 
databases or analyze data on nucleic acid or protein 
sequences. 

BIONET^" 

BIONET^ is a nonprofit computer network run by 
Intelligenetics, Inc. (Mountain View, CA) and funded 
by the Division of Research Resources of NIH and by 
modest user fees (13). Its goals are to "provide compu- 
tation assistance in data analysis and problem solving 
to molecular biologists and researchers in related field, 
to serve as a focus for the development and sharing 
of new software, and to promote rapid sharing of in- 
formation and coUaboration among a national commu- 
nity of scientists" (9). BIONET^* provides access to 
several major databases (GenBank®, EMBL, PIR, PDB, 
and databases of restriction enzymes and plasmid vec- 
tors) as well as to software for analyzing nucleic acid 
and protein sequences. The network also aids com- 
munication between its members through a series of 
bulletin boards on topics of user interest and through 
an electronic mail system. BIONET^"^* serves users in 
the United States, Canada, and Europe. 

National Biotechnology Information 
Center 

The National Biotechnology Information Center is 
an initiative to develop and enhance a range of tools 
for molecular biology information that is being spon- 
sored by the National Library of Medicine (NLM) (18). 
The project is presently the subject of several authori- 
zations bills but has already received some appropria- 
tions for a range of projects, including the building 
and maintenance of databases, developing a compre- 



hensive listing of existing databases, and improving in- 
formation retrieval systems. NLM has already devel- 
oped a prototype of a retrieval system, called the 
Information ^ trieval Experiment (IRX) that connects 
data from several different databases and graphic and 
visual sources. For example, a database search for a 
specific disease gene will yield information on whether 
the gene has been mapped, the map of the gene in 
graphic form, bibliographic information on publica- 
tions about the map, as well as information on clinical 
symptoms, diagnosis, and visual representations of af- 
fected patients (X-rays, diagrams, photos, and so on). 
The NLM initiative will enhance the management of 
data from genome projects and will forge links between 
information frcm many areas of molecular biology to 
aid in basic and biomedical research (15). The NLM 
is in an advantageous position to coordinate database 
activities through its expertise in handling informa- 
tion through existing literature databases such as 
MEDLINE. 

The Matrix of Biological Knowledge 
H orkshop 

The Matrix of Biological Knowledge Workshop, a 
month-long conference held during the summer of 
1987, was an attempt to formulate models and make 
recommendations for the organization of knowledge 
and data from aU disciplines in biology (16). It was spon- 
sored by the NIH, the DOE, the Sloan Foundation, and 
the Santa Fe Institute. 

The workshop grew out of the efforts of a commit- 
tee sponsored by the NIH that attempted to set forth 
and evaluate models used in biomedical research. Sev- 
eral scientific meetings prior to the workshop had ad- 
dressed the particular complexities of biologicial data; 
at the workshop, biologists, computer scientists, and 
database experts actually tried to work out some of 
the problems raised at earlier meetings. Participants 
at the workshop issued the following general recom- 
mendations; 

. . . that support for a centrally coordinated effort to 
establish a knowledge base of databases in the biologi- 
cal sciences be aggressively pursued; that the current 
independent efforts to establish inter-databa^e struc- 
tures and analysis tools be coordinated with a long-term 
view towards maximum integration; . . . that these co- 
ordinated efforts incorporate the mosi up-to-date com- 
puter science and analytical methods; and finally, that 
these activities directly involve the experimental and 
biotechnology communities in order to ensure the util- 
ity of the ensuing developments (16). 

These recommendations appear to reinforce the direc- 
tion of ongoing efforts in agencies that sponsor data- 
bases. The specific recommendations issued by work- 
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ing groups in each of seven broad categories may prove 
useful for the future management of databases in all 
of biology. 
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Appendix E 

Bibliometric Analysis of 
Human Genome Research 



Computer Horizons, Inc (CHI) was hired by the Of- ent case, however, the primary purpose of the litera- 
f ice of Technology Assessment to conduct a bibliomet- ture search was to provide indicators of growth rather 
ric analysis of work on human gene mapping, includ- than to develop specific bibliographies. As such, the 
ingan international bibliography of the most relevant analysis clearly demonstrated a rapid growth in the 
literature. This bibliography centered on an examina- scientific literature related to mapping and sequenc- 
tionofthegrowthofrelevant scientific literature keyed ing the human genome, and an acceleration of this 
to the words "Gene or Genes or Genetic," "Marker or growth over very recent years. 
Linkage or Map/' and "Human." Additional key word Over 11,000 entries of relevant literature were pre- 
combinations included "Human Chromosome," "Hu- sented in the bibliography, which scanned appropri- 
man DNA Sequence," "Human Nucleic Acid Sequence," ate publications from 1977 through 1986. The litera- 
"Human Restriction Fragment Length Polymorphism," ture search included journals published in English, 
and various combinations designed to select papers French, German, Dutch, Italian, Polish, Japanese, Span- 
on methods and techniques of DNA analysis. ish, Russian, Bulgarian, Swedish, Finnish, Norwegian, 

The use of publication counts as a measure of re- Danish, and Hebrew. All entries were subsequently 
search activity is part of the field of bibliometrics. A grouped by OTA into the country or region of origin 
growing body of research has demonstrated the use- to identify national and regional trends in research, 
fulness of bibliometric techniques: Counts of scientific The regions included the United States, Western Euro- 
papers and the numbers of citations to them have been pean countries, Japan, and other non-European coun- 
shown to be indicators of research productivity. Limi- tries. The table below presents the results. The data 
tations to bibliometric techniques do exist, particularly were used as the basis for figures 7-1 and 7-2 and ta- 
in balancing the treatment of non-English publications ble 7-1 in chapter 7, which display the growth in the 
Since the literature of science is dominated by English- total number of articles on human gene mapping and 
speaking researchers, there is an inherent bias against sequencing and the breakdown by country or region, 
citations of foreign-language publications. In the pres- 

Annual Publications In Human Genetics: 
Articles Published on Human Genes or Genetic Markers and Linkage Maps 

Year 1977 1978 1979 ~980 1981 1982 1983 1984 1985 1986 

United States 187 218 235 314 308 364 487 577 689 818 

Japan 7 11 17 22 32 45 39 58 67 85 

Western Europe 

Denmark 6 14 9 7 12 7 9 12 21 25 

Federal Republic of 

Germany 20 23 14 33 42 41 51 69 78 100 

Finland 3 6 8 7 7 5 9 12 15 14 

France 21 34 36 42 59 57 64 70 94 114 

«taly 8 8 9 15 24 15 44 39 44 66 

Netherlands 15 25 17 20 13 1 8 25 25 50 45 

United Kingdom 32 49 57 46 66 88 97 126 184 185 

Other 31 19 46 30 45 44 62 69 70 92 

Other Non-European countries 

Australia 2 8 11 17 18 22 24 23 20 38 

Canad?» 12 1 7 17 28 14 2 9 26 38 60 68 

Eastern Europe and 

U.S.S.R 23 17 21 38 36 33 36 51 60 62 

South Africa 0 6 7 8 6 3 4 6 16 9 

Other 20 20 35 33 33 41 32 42 87 63 

Uncertain 32 41 79 75 57 87 61 81 95 101 

Total 419 516 618 735 772 899 1 . 070 1,298 1,650 1,885 

SOURCE Office of Technology Assessment. 1968 
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Appendix 6 

Glossary 



Alleles: Alternative forms of a genetic locus; alleles 
are inherited separately from each parent (e.g., at 
a locus for eye color there might be alleles resulting 
in bh'8 or brown eyes). 

Amino acid: Any of a group of 20 molecules that com- 
bine to form proteins in living things. The sequence 
of amino acids in a protein is determined by the 
genetic code. ' 

Autoradio^aphy: A technique that uses X-ray film 
to visualize radioactively labeled molecules or frag- 
ments of molecules; used in analyzing length and 
number of DNA fragments after they are separated 
by gel electrophoresis. 

Autosome: A chromosome not involved in sex deter- 
mination. The diploid human genome consists of 46 
chromosomes, 22 pairs of autosomes and 1 pair of 
sex chromosomes 

Base pain Two nucleotides (adenosine and thymidine 
or guano sine and cytidine) held together by weak 
bonds. T\ /o strands of DNA are held together in the 
shape of a double helix by the bonds between base 
pairs. 

Gentiroorgftn: A unit of measure of recombination fre- 
quency. One centimorgan is equal to a 1 percent 
chance that a genetic locus will be separated from 
a marker due to recombination in a single genera- 
tion. In human beings, 1 centimorgan is equivalent, 
on average; to 1 million base pairs. 

Cloning: The process of asexually producing a group 
of cells (clones), all genetically identical to the origi- 
nal ancestor. In recombinant DNA technology, the 
process of using a variety of DNA manipulation pro- 
cedures to produce multiple copies of a single gene 
or segment of DNA. 

Complementary DIVA; cDlVA: DNA that is synthesized 
from a messenger RNA template; the single-strand 
form is often used as a probe in physical mapping. 

Contigs: Groups of clones representing overlapping, 
or contiguous, regions of a genome. 

Crossing oven The breaking during meiosis of one 
maternal and one paternal chromosome, the ex- 
changing of corresponding sections of DNA. and the 
rejoining of the chromosomes. 

G value paradox: The lack of correlation between the 
amount of DNA in a haploid genome and the bio- 
logical complexity of the organism. (C-value refers 
to haploid genome size.) 

Determinism: The theory that for every action taken 
there are causal mechanisms such that no other ac- 
tion was possible. 
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Diploid: A full set of genetic material (two paired sets 
of chromosomes), one from each parental set. All 
cells except sperm and egg cells have a diploid set 
of chromosomes. The diploid human genome has 
46 chromosomes. Compare haploid. 
DNA, deoxyribonucleic acid: The molecule that en- 
codes genetic information. DNA is a double-stranded 
molecule held together by weak bonds between base 
pairs of nucleotides. There are four nucleotides in 
DNA: adenosine (A), guanosine (G), cytidine (C), and 
thymidine (T). In nature, base pairs form only be- 
tween A and T and between G and C, thus the se- 
quence of each single strand can be deduced from 
that of its partner. 
DNA probes: Segments of single-strand DNA that are 
labeled with a radioactive or other chemical marker 
and used to identify complementary sequences of 
DNA by hybridizing with them. See hybridization. 
DNA sequence: The relative order of base pairs, 
whether in a stretch of DNA, a gene, a chromosome, 
or an entire genome. 
Domain: A discrete portion of a protein with its own 
function. The combination of domains in a single 
protein determines its unique overall function. 
Double helix: The shape in which two linear strands 

of DNA are bonded together. 
Electrophoresis: A method of separating large 
molecules (such as DNA fragments or proteins) from 
a mixture of similar molecules. An electric current 
is passed through a medium containing the mixture, 
and each kind of molecule travels through the 
medium at a different rate, depending on its elec- 
trical charge and size. Separation is based on these 
differences. 

Enzyme: A protein that acts as a catalyst, speeding 
the rate at which a biochemical reaction proceeds 
biit not altering its direction or nature. 

Eukaryote: Cell or organism with membrane-bound, 
structurally discrete nucleus and other well- 
developed subcellular compartments. Eukaryotes in- 
clude all organisms except viruses, bacteria, and 
blue-green algae. Compare prokaryote. 

Eugenics: Attempts to improve hereditary qualities 
through selective breeding. See positi\ e eugenics , 
negative eugenics, eugenics of normalcy. 

Eugenics of normalcy: Policies and programs in- 
tended to ensure that each individual has at least 
a minimum number of normal genes. 

Exons: The protein-coding DNA sequences of a gene. 
Compare introns. 
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Gamete: Mature male or female repnxlurtix e ( ell with 
a haploid set of chromosomes (23); that is. a sperm 
or o\aim. 

Gene: The fundamental physical and tunctioiial unit 
of heredity. A gene is an ordered sequeiu e of nucleo- 
tides located in a particular position on a particular 
chromosome. See gene expression. 

Gene expression: The process by which a gene's blue- 
print is converted into the structures present and 
operating in the cell. Expressed genes include those 
that are transcribed into mRNA and then translated 
into protein and those that are transcribed into RNA 
but not translated into protein (eg., transfer and 
ribosomal RNAs). 

Gene families: Groups of closely related genes that 
make similar products. 

Gene product The biochemical material, either RNA 
or protein, made by a gene. The amount of gene 
product is used to measure how active a gene is; 
abnormal amounts can be correlated with disease- 
causing genes. 

Genetic code: The sequence of nucleotides, coded in 
triplets along the mRNA, that determines the se- 
quence of amino acids in protein synthesis. The DNA 
sequence of a gene can be used to predict the mRNA 
sequence, and the genetic code can in turn be used 
to predict the amino acid sequence 

Genetic en^neering technologies: See recombinant 
DNA technologies. 

Genetic linkage map* A map of the relative positions 
of genetic loci on a chromosome, determined on the 
basis of how often the loci are inherited together 
Distance is measured in centimorgans 

Genetics: The study of the patterns of inheritance of 
specific traits. 

Genome: All the genetic material in the chromosomes 
of a particular organism; its size is generally given 
as its total number of base pairs. 

Genome projects: Research and technolog}' dex'elop- 
ment efforts aimed at mapping and sequencing some 
or all of the genome of human beings and other 
organisms. 

Genomic library: A collection of clones made from 
a set of overlapping DNA fragments representing 
the entire genome of an organism. Compare library. 

Haploid: A single set of chromosomes (half the full 
set of genetic material), present in the egg and sperm 
cells of animals and in the pollen cells of plants. Hu- 
man beings have 23 chromosomes in their repro- 
ductive cells. Compare diploid. 

Homeobox: A short stretch of nucleotides whose se- 
quence is virtually identical in all the genes that con- 
tain it. It has been found in many organisms, from 
fruit flies to human beings. It appears to determine 



when particular groups of genes are expressed in 
the development of the fruit fly 

Human gene therapy: Insertion of normal DNA 
directly into cells to correct a genetic defect. 

Human Genome Initiative: Collective name for sev- 
eral projects begun in 1986 by DOE to 1) create an 
ordered set of DNA segments from known chromo- 
somal locations, 2) develop new computational meth- 
ods for analyzing genetic map and DNA sequence 
data, and 3) develop new techniques and instru- 
ments for detecting and analyzing DNA. 

Hybridization: The process of joining two complemen- 
tary strands of DNA, or of DNA and RNA, together 
to form a double-stranded molecule. 

Informatics: The study of the application of computer 
and statistical techniques to the management of in- 
formation. In genome projects, informatics includes 
the development of methods to search databases 
quickly, to analyze DNA sequence information, and 
to predict protein sequence and structure from DNA 
sequence data. 

International technology transfer Movement of in- 
ventions and teclinical know-how across national 
borders. 

Introns: The DNA sequences interrupting the protein- 
coding sequences of a gene that are transcribed into 
mRNA but are cut out of the message before it is 
translated into protein. Compare exons. 

Karyotype: A photomicrograph of an individual's chro- 
mosomes arranged in a standard format showing 
the number, size, and shape of each chromosome; 
used in low-resolution physical mapping to corre- 
late gross chromosomal abnormalities with the char- 
acteristics of specific diseases. 

Library: A collection of clones in no obvious order 
whose relationship can be established by physical 
mapping. Compare genomic library. 

Linkage: The proximity of two or more markers (e.g., 
genes. RFLP markers) on a chromosome; the closer 
together the markers are, the lower the probability 
that they will be separated during meiosis and hence 
the greater the probability that they will be inherited 
together. 

Locus: The position on a chromosome of a gene or 
other chrorii some marker; also, the DNA at that 
position. Some restrict use of locus to regions of DNA 
that are expressed. See gene expression. 

Marken An identifiable physical location on a chro- 
mosome (e.g., restriction enzyme cutting site, gene, 
RFLP marker) whose inheritance can be monitored. 
Markers can be expressed regions of DNA (genes) 
or some segment of DNA with no known coding 
function but whose pattern of inheritance can be 
determined. 
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Meiosis: The process of two consecutive cell divisions 
in the diploid progenitors of sex cells. Meiosis re- 
sults in four rather than two daughter cells, each 
with a haploid set of chromosomes. 

Messenger RNA; mRIVA: A class of RNA produced by 
transcribing the DNA sequence of a gene. The mRNA 
molecuie carries messages specific to each of the 
20 amino acids. Its role in protein synthesis is to 
transmit instructions from DNA sequences (in the 
nucleus of M^. cell) to the r ibosomes (in the cytoplasm 
of the cell). 

Multifactorial or multigenic disorders: See polvgetvc 
disorders. 

Mutation: Any change in DNA sequence that results 
in a new characteristic that can be mherited Com- 
pare polymorphism. 

Negative eugenics: Policies and programs intended 
to reduce the occurrence of genetically determined 
disease. 

Nucleotld<:: A subunit of DNA or RNA consisting of 
a nitrogenous base (adenine, guanine, thymine, or 
cytosine in DNA; adenine, guanine, uracil, or cyto- 
sine in RNA), a phosphate molecule, and a sugar mol- 
ecule (deoxyribose in DNA and ribose in RNA). Thou- 
sands of nucleotides are linked to foi m the DNA or 
RNA molecule. See DNA, base pair, RNA. 

Oncogene A gene, one or more forms of which is asso- 
ciated with cancer. Many oncogenes are involved, 
directly or indirectly, in controlling the rate of cell 
growth. 

Physical map: A map of the locations of identifiable 
landmarks on DNA (e.g., restriction enzyme cutting 
sites, genes, RFLP markers), regardless of in- 
heritance. Distance is measured in base pairs For 
the human genome, the lowest -resolution physical 
map is the banding patterns of the 24 different chro- 
mosomes; the highest -resolution map would be the 
complete nucleotide sequence of the chromosomes. 
Polygenic disorders: Genetic disorders resulting from 
the combined action of alleles of more than one gene 
(e.g., heart disease, diabetes, and some cancers). Al- 
though such disorders are inherited, they depend 
on the simultaneous presence of several alleles, thus 
the hereditary patterns are usually more complex 
than those of singJe-gene disorders. Compare single- 
gene disorders. 
Polymorphism: Difference in DNA sequence among 
individuals. Genetic variations occurring in more 
than 1 percent of a population would be considered 
useful polymorphisms for genetic linkage analysis. 
Compare mutation. 
Positive eugenics: The achievement of systematic or 
planned genetic changes to improve individuals or 
their offspring. 



Prokaryote: CvW or organism lacking membrane- 
hound, structurally disc iTto nucleus and subcellu- 
lar ( ompai tinents Bacteria are examples Compare 
eukarxote. 

Protein: A large molecule composed of chains of 
smaller molecules (amino acids) m a specific se- 
quence, the sequence is determined by the sequence 
of nucleotides in the gene coding for the protein. 
Proteins are required for the structure, function, 
and regulation of the body 's cells, tissues, and or- 
gans, and each protein has a unique function. Ex- 
amples are hormones, enzymes, and antibodies. 

Recombinant DIVA technologies: Procedures used to 
join together UNA segments in a cell-free system (an 
environment outside of a celJ or organism). A re- 
combinant DNA molecule can enter a cell and repli- 
cate there, either autonomously or after it has be- 
come integrated into a rellular chrr losome. 

Replication: The synthesis of new DNA strands from 
existing DNA. In human beings and other eukary- 
otes, replication occurs in the nucleus of the cell. 

Resolution: Degree of molecular detail on a physical 
map of DNA, ranging from low to high. 

Restriction enzyme, endonuclease: A protein that 
recognizes specific, short nucleotide sequences and 
cuts DNA at those sites I here are over 400 such 
enzymes in bacteria that recognize over 100 differ- 
ent DNA sequences. See restriction enzyme cutting 
site. 

Restriction enzyme cutting site: A specific nucleo- 
tide sequence of DNA at which a restriction enzyme 
cuts the DNA. Some sites occur frequently in DNA 
(e g., every several hundred base paii s), others much 
less frequently (e.g., every 10,000 base pairs). 

RFLP, restriction fragment length polymorphism: 
Variation in DNA fragment sizes cut by restriction 
enzymes, polymorphic sequences that are respon- 
sible for RFLPs are used as markers on genetic link- 
age maps. 

Ribosomal RNA; rR\A: A class of RNA found in the 
ribosomes of cells 

RNA; ribonucleic acid: A chemical found in the nu- 
cleus and cytoplasm of cells; it plays an important 
role in protein synthesis and other chemical activi- 
ties of the cell The structure of RNA is similar to 
that of DNA There are several classes of RNA 
molecules, including messenger RNA, transfer RNA, 
ribosomal RNA, and other small RNAs, each serving 
a different purpose 

Sex chromosomes: The X and Y chromosomes in hu- 
man beings that determine the sex of an individual. 
Females have two X chromosomes in diploid cells; 
males have an X and a Y chromosome. 

Single-gene disorders: Hereditary disorders caused 
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by a single gene (e.g., Duchenne muscular dystrophy, 
retinoblastoma, sickle cell disease). Compare poly- 
genie disorders. 

Somatic cells: Any cells in the body except reproduc- 
tive cells and their precursors. 

Technology transfen The process of converting sci- 
entific knourledge into useful products. 

Transcription: The synthesis of mRNA from a se- 
quence of DNA (a gene); the first step in gene ex- 
pression. Compare translation. 

Transfer RSA, tRN A: A class of RN A having structures 
urith triplet nucleotide sequences that are com- 
plementary to the triplet nucleotide coding se- 



quences of mRNA. The role of tRN As in protein syn- 
thesis is to bond u^ith amino acids and tranfer them 
to the ribosomes, u^here proteins are synthesized 
according to the instructions carried by mRNA- 

Translation: The process in u^hich the genetic code 
carried by mRNA directs the synthesis of proteins 
from amino acids. Compare transcription. 

Vecton DNA molecule originating from a virus, a bac- 
terium, or the cell of a higher organism used to carry 
additional DNA base pairs; vectors introduce for- 
eign DNA into host cells, u^here it can be reproduced 
in large quantities. Examples are plasmids, cosmids, 
and yeast artificial chromosomes. 
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97 

cell receptors, 21-22 

Center for the Study of Human Polymorphism 

ooUaborative efforts of, 8, 106, 143-144, 146, 149 

family pedigree data set, 58, 143, 146 

funding for, 106 

mission, 145 
Centers for Disease Control, 103 
chicken, haploid DNA content, 25 
Chiles, Lawton, 97 

cholesterol, low-density lipoprotein, 58 
chromosome maiiieni 

for single-gene diseases, 57 

funding for research on, 6 

mapping through family inheritance patterns, 27 

maps, low-resolution, 4 

use for genetic linkage studies, 6, 27, 28, 44 
chromosomes 

banding patterns, 30, 33, 34-35, 40, 42, 45, 56 

crossing over (recombination), 26, 27 

deletion of, 25, 31, 32 

dipldd number, 21 

Drosophik melanogaater salivary ^land, 30, 32 
duplication, 25 
E. cob, 41 

gene assignment to, 31 
haploid number, 21 
hybrids, single, 31 
inversion, 26 

isolation techniques, 31-32, 43 
of clinical significance, 44 
phage lambda, 36, 39, 42, 56, lOO 
polytene, 42 

sordng, 31-32, 37, 47, 56, 97, 100 
species similarities in, 34-35, 36 
transk)cation, 25-26, 31, 32 
yeast artificial, 36-37, 39, 43, 56 
chromosomes, human 
1, 27 
4, 28 

6, 148 

7, 31, 44, 62 
9, 33, 148 
10» 32, 33 
Ih 32 

13, 34 

16, 31, 44, 100, 148 

17, 31, 33 

19, 31, 44, 100 

21, 35, 44, 95, 100, 145-146 

22, 44, 145 
average size, 37 
number, 3 

resemblance to primate chromosomes, 34-35 

X, 24, 27, 30, 31, 44, 63, 100, 148 

Y, 24, 30, 31, 145 
chronic granulomatous disease, 57, 59, 62 
Church, George, 4445 
ckming/clones 

access to and ownership of, 147 



automation of, 47, 48 
banding patterns, 40 
CDNA, 59-63 

disease-associated genes, 57, 59, 60 
of DNA fragments, 39, 72 
drug development through, 62-63 
E, coh, 41 

Hngerprinting method for ordering, 42 
fruit fly chromosomes, 42 
gene isolation by, 31, 59 

Ubraries of, 39, 42, 59, 62-63; see a/so contig maps 

microdissection, 42 

NIH grants for, 94 

ordering of, 38-40, 41, 42 

overlapping, 39, 42, 62, 100 

phage lambda chromosomes, 36, 39, 42, 56, 100 

repositories, 97, 101, 115 

5. cereviaiae, 42 

vectors, 35-37, 38-39, 42, 56, 67, 100 

yeast artificial, 36-37, 39, 43, 56, 157 
Cold Spring Harbor Laboratories Conference, 6 
collaboration on genome research 

by Australia, 148 

by Center for the Study of Human Polymorphism, 8, 

106, 143, 146, 149, 157 
center-based vs. networking, 156 
databases and repositories, 8, 139, 158-159 
DOE, 157 

existing frameworks, 155-157 
European, 142, 150 

International Human Gene Mapping Workshops, 29, 
157 

international journals, 157-158 

oi^anizational options, 152-155 

precedents for international scientific programs, 
150-152 

views on, 152-153 

Washington University-RIKEN, 157 
Collaborative Research, Inc. 

DNA probe development, 58, 108 

RFLP linkage map, 6, 30 
collagen, 61, 67 
color-blindness, 21 
Columbia University 

mapping of E, coU genome, 41 

mapping of human chromosome 21, 44, 100 
Compton, Arthur Holly, 99 
computers, computational methods, and software 

artiflcial intelligence, 96 

costs, 180-181 

for DNA sequencing, 65, 97 

gene mapping applications to, 57, 146 

networking, 156 

NIH funding for improvements in, 95, 96 

see also databases; informatics 
Concertation Unit for Biotechnology in Europe, 140 
consortia 

of Federal/private interests, authority for, 16, 121 
funding, 122 
goals, 121 

intellectual property rights, 122 
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Midwest Pitnt Biotechnology Consortium, 122 
national, to administer genome projects, ii-lS, 121-123 
peer review, 122 
two-tiered system, 122 
contig mapping/maps 
construction, 39, 40 

correlation with large^fragment restriction maps, 42 
forward genetics applications, 61 
nematode, 42, 43 
reverse genetics applications, 62 
strategies, 43-44 
yeast, 42 
controversial issues 
Big Science vs. small science, 125-128 
DNA sequencing, extent of, 4, 6, 44, 57, 79, 81 
feasibility of genome mapping, 4 
quotes on, 126 

resohition of genome mapping, 3, 79, 81, 88 

see mImo ethical issues 
com, haploid DNA content, 25 
Coulson, Alan, 44-45, 147-148 
Crick, Francis K.C., 3, 21 
cysteine, codon, 23 
cystic HbiOsis, 57, 58, 62, 149 
cytidine, 21-22 

Dana Farfoer Cancer Institute, 97 
databases 

access.to and ownership of, 12, 8i, 102, 128, 134, 139, 

146; see mIbo technology transfer 
Cell Line Two-Dimensional Gel Electrophoresis, 97 
CODATA Hybridoma Databank, 141 
DNA DaU Bank of Japan, 139, 158 
DNA fingerprints, 80 
European support of, 141 
funding for, 7, 12, 96-97, 141, 190 
Genatlas, 157 

GenBank* 46, 96, 98, 109, 115, 139, 142, 154, 158, 
190 

genetic maps, 24, 98, 106, 189-190 
government protection of. 87 
HHMI, 7, 8, 98, 106 

Human Gene Mapping Library, 106, 189-190 
importance, 4, 9 

international collaboration on, 8, 139, 158-159 
Japan Protein Information Database, 159 
linking of, 98 
management of, 12 

Martinsreid Institute for Protein Sequence data, 159 
MEDLARS/MEDLINE, 97 
mouse genetics, 106 

National Library of Medicine, 7, 8, 12, 96, 97 
needs for, 191-192 

nucleot'ie sequence data, 46, 96, 98, 141, 158, 190 
On line Mendelim Inheritance in Man, 24, 98, 106, 

189*190 
Protein DaU Bank, 190-191 
Protein Identification Resource, 97, 98, 158-159, 
190-191 
Dausset, Jean, 145-146 
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DeLisi, Charles, 100, 153 

Denmark, national genome research efforts, 133, 143, 
195 

deoxyribonucleic acid, see DNA listings 
Department of Defense, biomedical research resources, 
104 

Department of Energy 
funding for genome projects, 7, 96, 100 
Health and Environmental Research Advisory Commit- 
tee report, 101-102 
interest in massive sequencing, 9 
international research collaboration, 157 
as lead agency for genome projects, 12, 14, 116, 117 
mission, 7, 99-100 

Office of Health and Environmental Research, 99-101 
organization, 99, 117-118 
peer review, 101, 118 

recommendations for genome projects, 4, 11, 100 

research supported by, 100, 117 

woricshops sponsored by, 6, lOO 

see also Human Genome Initiative 
determinism, effect of genome mapping on, 86 
development, see human physiology and development 
diabetes, 62 
diseases 

infectious, 64 

linking mapping and sequencing data to, 104 
see also genetic diseases; and specific diseases 
DNA 

amount relative to organism complexity, 24-25 
C-value paradox, 24-25 
cloning in plasmids, 36-37, 39 
complementary, see cDNA 
discovery, 3 

electrophoretic separation of, 37-39 

expendable fraction, 25, 57 

fingerprints, 89; see also genetic screening 

fragmentation of, 37-39 

mitochondria], 71 

oldest human samples, 72 

polymerase, 45 

recombinant technology, see recombinant DNA tech- 
nology 

repUcation process, 21-22 

structure, 3, 21-22 

transcription to mRNA, 23-24 

see also chromosomes 
DNA markers, see chromosome markers 
DNA probes 

automated synthesis of, 47, 48 

cDNA, 28-29, 32, 33, 59-61, 63 

companies developing, 58 

fluorescentiy labeled, 46-48 

for genetic disease diagnosis, 58-59 

in in situ hybridization, 33 

number needed to complete human linkage map, 29 
oligonucleotides, 48 

radioactively labeled, 28-29, 32-3^. 40, 44-46, 58 
reliability, 58 

for RFLP markers, 28-29, 56, 58, 61-62 
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synthetic, 4S, 59SO 

uie to clone genes, 60 
DNA Segment Library, 97 
DNA sequence/sequencing 

automation of, 47-48 

commercialization, 82-83, 133, 138*139 

computer-asaisted, 65 

controversies, 4, 6, 44, 79; see also ethical issues 

costs, 6, 182-183 

daubase, 46, 96, 98, 19G 

definition, 3, 21 

directly from genomic DNA, 45 

£. cotf, 41, 100 

enhanced fluorescence detection method, 46, 47 

exons, 59, 61, 63, 65, 69 

expenditures, federal, 8 

facilities for, 13 

government role in, 87 

homeo box,' 67-68 

importance, 9 

introns, 25, 30, 61, 65, 69-70 
longest stretch determined, 46 
of mitochondria, 71 
multiplex, 44-46 

mutation detection applications, 56 
NIH funding for, 8 
rate, 46 

repeated. 25, 28, 43, 57 

RFLP mapping required for, 37-39 

scanning timneling microscopy for, 46 

selective amplification without prior cloning, 45-46 

species comparisons, 68-70 

steps, 47 

strategies, 44-45 

technologies, 44-47 

variations, 28, 29 

VNTR, 29 

Domestic Policy Council, 8, 105, 109, 119 
Donis-Keller, Helen, 30 
Down's syndrome, 32, 35, 58, 95, 146 
DrosophUM nwknogaster 

amount sequenced, 47 

genome mapping, 42-43 

genome size, 47 

salivary gland chromosomes, 30, 33 
drugs and pharmaceuticals, development, 62-63 
Duchenne muscular dystrophy, 57-59, 61-63 
Duffy blood group, 27 
Dulbecco, Renatto, 100, 126, 145 
dwarfism, 64 
dystrophin, 63 

EGftrG Biomolecular, automated DNA sequencer, 47 
£J. du Pont de Nemours & Co., automated DNA se- 
quencer, 47 
electrophoresis, see gel electrophoresis 
England, see United Kingdom 
enzymes 

functions, 22 

see mIso specific enzymes 
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epidermal growth factor, 64 
Epstein-Barr virus, 46 
erythropoietin, 64 
Escherichia coli 

amount sequenced, 47 

genome mapping, 41, 43, lOO 

genome size, 47 
ethical issues 

academic freedom, 87 

access to and ownership of databases and repositories, 
16, 82, 88 

access to and use of genetic information, 79-80 
attitudes and perceptions of ourselves and others, 
85-86 

commercialization, 16, 82-83, 133 
diagnostic/therapeutic gap, 83 
eugenics, 81, 84-85, 88, 143-144 
genetic fingerprinting, 80 

government role in nupping and sequencing, 87, 88 

international competitiveness, 87-88, 133 

physician practice, 83 

reproductive choices, 83-84, 88 

responsibility for considering, 123-124 
eugenics 

negative, 85, 143-144 

of normalcy, 85 

positive, 84-85 
eukaryotes, 70-71 

Europe, Eastern, interest in genome projects, 8, 133, 
143, 195 

Europe, Western, genome sequencing and mapping 
activities, 139-148; see also specific countries and 
organizations 

European Economic Community, genome research, 139- 
141, 153 

European Molecular Biology Laboratory, 8, 139, 141-142, 
158 

European Molecular Biology Oi*ganization, 8, 141 
European Research Coordination Agency, 142, 145-146 
European Science Foundation, 142, 156 
evolution, see molecular evolution 

facilities for genome research 
bioprocess engineering, 102 
data handling, European needs, 142 
DOE funding for, 100 

flow cytometry, 32, 97; see also specific national lab- 
oratories 
need for, 10, 128 

NSF biology centers, 8, 102-103, 109 
factor IX, 61 
factor Vn, 61 
factor Vin£, 64 

familial hypercholesterolemia, 56, 58, 149 
family pedigree projects 

CEPH data set on, 58, 136, 146 

Danish, 143 

Egyptian, 134, 136 

on mental illness, 156 

South African, 149 
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use in genetic linkage mapping, 17, 33, 58, 61 
Venezuelan, Huntington^ disease, 63-64, 134-136, 143, 
146 
fatalism, 86 

Federal Advisory Committee Act, 124 

Federal Republic of Germany, genetics research, 8, 133, 

143- 144, 195 
flbroblast growth factor, 64 

Finland, national genome research effort, 133, 144 
flow cytometry 
enhanced fluorescence detection in, for DNA sequenc- 
ing, 46 

extraction of whole chromosomes by, 37 
facility, 32 
France 

Center for the Study of Human Polymorphism, 33, 58, 

144- 146 

genome projects, 8, 144-145 
published genome research, 133, 195 
fruit fly 

developmental regulation in, 67 
DrmoifMM melMnogaater, 30, 33, 42-43, 47 
genome mapping, 42-43 
haploid DNA content, 25 
human DNA sequences compared with, 68 
lethal mutations in larval stage, 42 
funding for genome projects 
advisory body for determining, 124 
of consortia, 122 
databases, 7, 12, 96-97, 190 
determinants, 98 

determinants of congressional appropriations, 11-12 

DNA marker studies, 6 

DOE, 7, 96, 100, 118, 190 

European Economic Community, 139-143 

HHMI, 7, 190 

international, 8 

NIH, 6, 7, 94-98, 117, 155, 190 

NSF, 7, 8, 96, 190 

pluralism in, 13, 15, 119 

priority setting, 10 

private vs. federal, 79, 83 

recommendations, 4, 11-12, 107 

through a lead agency, effects of, 12-13 

through a \ational consortium, 14 

USDA, 190 

West German, 144 

Gall, Joseph, 126 
Gahon, Francb, 84 
gamma interferon, 64 
gel electrophoresis 
database, 97 

DNA separation for physical mapping, 37, 39, 45 

polyacrylamide, 45 

pulsed-fleld, 37, 44, 56 

in RFLP mapping, 28, 37, 58 
GenBank* 46, 96, 98, 109, 115, 139, 142, 154, 158, lyO 
gene expression 

control of, 57 



steps in, 23-24 
study centers, 144 
gene products 
functions of, 67 

with potential as therapeutic agents, 62, 64 
gene therapy, 64, 141 
genei 

biochemical identification, 62 

in a chromosome band, number, 33 

cotor-blindness, 21 

definition, 3, 21, 24 

dosage mapping, 34 

encoding ribosomal RNAs, detection, 33 
expressed, 24, 30 
families of, 25, 70 

functions, approaches to understanding, 66-67, 73 
homeotic, 68 

isolation techniques, 31, 33, 59-62 
largest, 63 

linked, 26, 34; see also genetic linkage mapping/maps 
mapping, see genetic linkage maps 
spedes similarities in, 34 
structure/function relationships, study of, 144 
see alao human genes 
genes, human 
aldolase, 33 

chromosomal locations known, 4 

number of loci identified, 24, 30 

number per haploid genome, 24 

sizes, 61 
genetk; code 

definition, 21-24 

for amino acids, 23 
genetic diseases 

chromosomal locations of genes for, 4 

clinical services for, 100 

companies developing DNA probes for diagnosis of, 58 
correlating gross chromosomal abnormalities with, 32 
diagnostic information, physician handling of, 83 
family pedigree studies, 61, 63 
HHMI support of research on, 8 
isolation of genes associated with, 59-62 
mechanisms, 4 

not associated with biochemical defects, 61 
polygenic, 62 

RFLP markers for, 28, 56, 58 

single-gene, 57, 88 

see also specific diseases 
genetic information 

access to and use of, 79-80, 82, 84 

causes of changes in, 25-26 

insurer use of, 81, 83 

organization and function, 21-26 
genetic linkage mappin^^Miips 

autoradiography use 19 

autosomes, 27 

costs, 181-182 

databases, 24, 98, 106, 189-190 
disease diagnosis applications, 56, 58, 62 
distance measurements on, 27 
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early attempUi 4, 6, 21 

electrophoretic technology in, 28 

family pedigree data in, 27, 33, 5S, 61 

HHMI fundkig for, 7 

medical applications, 56, 58, 62-64 

number of markers needed to complete, 29-30 

prefects to link physical maps with, 181-182 

purpose, 26-27 

recombinant DNA technology use in, 28 
resolution, 62 

reverse genetics applications, 62 

of RFLP, 28-30, 62 

somatic cell hybridization for, 27 

X chromosome, 27 
genetic locus, see chromosome marker 
genetic screening 

ethical questions about, 80, 88 

for missing chiklren, 80 

for proof of paternity, 80 
genetic selection, see eugenics 
genetics 

definition, 21 

forward, 59-61, 62-63 

HHMI funding for, 7 

molecular, MH research resources activities related to, 
97 

MH funding for, <>5 
population, 72-73 
reverse, 59, 61-62 
Genetics Institute, robotic devices for DNA sequencing, 
48, 108 

Genome Corp., physical mapping project, 108 
genome mapping 

agricultural applications, 73 

application in devetopmental studies, 42, 65 

automation, 47 

determinism and, 86 

distance measurements in, 40 

evolutionary applications, 68-72 

facilities, 13 

imfX3rtance, 9 

international efforts and cooperation, 8, 9, 150-159; 
see also specific countries 

resohition levels in, 56, 79 

seb ."^so genetic linkage mapping/maps; physical 
mapping 
genome mapping, human 

commercialiasation, 82-83, 138-139 

controversies, 3, 4, 6, 9, 44, 55, 57, 102 

government role in, 87 

priorities for, 88 

scale of efforts, 5, 24 

strategies, 43-46 
genome mapping, nonhuman 

bacteria, 4, 40, 41, 44 

fruit fly, 42-43, 44 

importance, 9, 44, 107 

international efforts, 8, 42 

nematodes, 4, 42, 44 
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plants, 73, 136, 149 
yeast, 4, 41-42, 44 
genome projects 
accountability to Congress, 13, 14, 124 
administration of, 12-15, 115-123, 184 
advisory board structure for, 123-124 
appropriations for, see funding 
benefits, 11, 55, 56, 133, 172-174 
Big Science v. small science approach, 120, 125, 
127-128 

center-based vs. networking, 156 

collaboration on, 150-159, see also collaboration on ge- 
nome research 

commercialization potential, 82-83, 133, 138-139, 151, 
165 

common features, 7 
component nature, 4, 6, 10 
congresstonal oversight, 15-17 
congresskinal role in, 11-17 
consortium structure for, 14-15, 121-122 
cooperation among agencies, 9, 15, 118-119 
costs, 11-12, 4, 47, 180-185 
definition, 4 

displacement of other research by, 102, 125 
duplicatkm of efforts, 13, 82, 105 
early estimates of costs for, 184-186 
econonuc impacts, 165, 172 
ethical considerations, 79-88 
expenditures. Federal, 8 
facilities, 10, 13 
focus, 7-9 

funding, see funding for genome projects 
interagency coordination and communications, 8, 11, 
123 

interagency task force oversight of, 14, 119-121 
international efforts on, 133-159; see also specific 
countries 

lead agency concept, 12-14, 115, 116-118 
legislation, 12, 14, 123 
manpower availability, 10 

medical applications, 56-64; see also disease; medicine 

military applications, 174 

misconceptions about, 9-10 

national prestige associated with, 174 

objectives, 7, 9, 55 

organization of, 12-15 

organizations involved in, 6, 7 

policy development for, 134 

political interference with, 127-128 

quality control and reference standards, 103, 127, 183 

resoKTce allocation for, 10; see also funding for ge- 
nome projects 

scope of, 10, 134 

training of personnel, 183-184 

U.S. competith^eness and, 11, 133 

see also genome mapping; DNA sequences/sequencing; 
Human Genome Initiative; pilot prefects 
genome, human 

amount sequenced, 46-47 
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bibliometric analysis of research on, 133, 157158, 195 

size, 24, 43, 4647 
genomes 

bacteriophage T4, 40 

dennition, 3, 21 

Epstein Barr virus, 46 

mitochondrial, 71 

organization, 21 

regeneration, 21 

size, 21, 24-25, 43 

smallest, 148 
genomic library, 35 

Germany, see Federal Republic of Germany 
Gilbert, Walter, 4446, 126, 153, 156 
glutamic acid, codon, 23 
ghitamine, codon, 23 
glycine, codon, 23 

granulocyte colony stimulating factor, 64 
guanosine, 21-22 
Gusella, James, 136 

Harvard University, DNA sequencing, 44, 100 
heart disease, 58, 62, 64 
hemophilia, 56, 57, 58, 64 
high-mobility group CoA reductase, 61 
Hill, Lister, 97 
histidine, codon, 23 

Hitachi, Ltd., automated DNA sequencer, 47 
Hood, Leroy, 47, 126 
hormones, 21 

Howard Hughes Medical Institute 
as lead agency for genome projects, 13 
budget, 8 

collaboration with CEPH, 146 
databases, 7, 8, 98, 106 
expenditures, 102, 106 
funding, 7, 105, 109 
genome initiatives, 8, 105-106, 109 
mission, 7 

RFLP mapping project, 6, 29 
university centers, 106 
Hpa I, 28 

Human Gene Mapping Library, 106, 189-190 
Human Gene Mapping Workshop, 29, 106, 144, 157 
Human Genetic Mutant Cell Repository, 31, 96, 190, 

192-193 
Human Genome Initiative 

budget, 7-8, 101 

expenditures, 7-8 

justiHcation for, 102 

management, 6 

objectives, 6, 7, 14 

recommendations on, 101 

suges, 101 

workshops, 6 
human growth hormone, 59, 62, 64 
human physiotogy and development 

genome ma]^ing applications to, 65 

NICHD-supported research, 95 
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Huntington's disease, 28, 57, 58, 64, 83, 134-136, 146, 
149 

hybridization, see in sitv hybridization; somauc cell 

hybridization 
hypercholesterolemia, 56, 58, 149 
hypertension, 64 

Imperial Cancer Research Fund, 148 
in situ hybridization 

cDNA mapping by, 30, 33-34 

in mapping genes to whole chromosomes, 56 

localization of fruit fly clones^by, 42, 43 
Index Medicos, 97 ^ 
Industrial Biotechnology Association, opinions on Federal 

initiatives in mapping and sequencing, 108 
informatics 

Advanced Informatics in Medicine, 141 

Bioinformatics: Collaborative European Programs and 
Strategy, 141 

BIONET™ri93 

Contextual Measures for R&i) in Biotechnology, 
140*141 

National Biotechnology Information Center, 193 
infrastructure for genome projects 

European, 141 

Federal support, 8, 102 

resource allocation, 10 
Institute for Medical Research, somatic cell hybrid line 

repository, 31 
insulin, 21, 59, 61, 62, 64 

Integrated Genetics, DNA probe development, 58, 108 
intellectual property, protection of, see patent and copy- 
right policies 
IntelliGenetics Corp., 96 
interleukin-2, 62, 64 

International efforts on genome projects 
collaboration and cooperation, 150-159; see also col- 
laboration on genome research 
see also speciflc counuies 

International Geophysical Year, 150-151 

isoleucine, ccdon, 23 

Italy, human genome research, 8, 133, 145-147, 195 
Japan 

automation of DNA sequencing equipment, 47-48, 137 
basic science expertise, 137 
collaboration on research, 157 

commercialization of mapping an J sequencing technol- 
ogies, 133, 138 
competitiveness with U.S., 133, 137-139 
cooperation with U.S., 139 
databases and repositories, 139 
expenditures on genome projects, 8-9, 138 
funding for genome research, 137 
grants program in genetics, 9 
Human Frontiers Science Program, 9, 137-138 
mapping and fequencing research, 136-138 
Ministry of Education, Science, and Culture, 9, 136-137 
Ministry of International Trade and Industry, 137-138 
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peer revieW; 136 

physical mapping of E. coti genome, 41 

policy development on genome research, 136-137 

published genome research, 133, 195 

robotics technology^ 137 

Science and Technology Agency, 8-9, 137 

workshop on DNA sequencing technologies, 47 

kary itypes, human female, 34 
karyoQrping; 32-33, 56 
Kemiedy, Jdm, 97 
kkfai^ diseases, 57, 58, 64 
Kirschstein, Ruth, 94 n.l, 117 
Koshland, Daniel, 126 

Lakxidi, Jean-Marc, 146 

Latin America, genome research, 149 

Lawrence Uvermore National Laboratory 

chromosome sorting, 100 

mapping of chromosome 19, 44, 100 

ordering of DNA ckines, 100, 108 
Lederberg, Joshua, 126 
leucine, oodon, 23 
leukemia, 64 
Levinson, Rachel, 94 n.l 
libraries 

of DNA fragments, construction of, 39 

of overlapping ckines, 38-39 

see also repositories 
life sciences 

DOE funding for, 7 

HHMI funding for, 7 

NIH funding for, 7 

NSF funding for, 7, 8 
Ufecodes, DNA probe devekipment, 58 
ligase, 39 

%, hapk)id DNA content, 25 
Undberg, Donald AS., 94 n.l 
Los Alamos National Laboratory 

chromosome sorting at, 32, 100-101 

mai^ping of chromosome l€, 44, 100 

ordering of DNA ckmes, 100 

see Miao GenBank* 
LoveU, Joseph, 97 

low-density lipoprotein receptor, 58, 61 
^ine, codon, 23 

macrophage colony stimulating factor, 64 
Massachusetts Institute of Technok)gy, bioprocess engi* 

neering center, 102 
Maxam, Alan, 44-46 
Max Planck Society, 144 
McKustek, Victor, 24, 98 
medicine 

diagnostic tool devekipment, 56, 58-59 
drug devetopment, 62*63 
human gene therapy prospects, 64 
isolatkm of genes associated with diseases, 59-62 
see mImo genetic diseases 
mekwis, 26-27 



Mendel, Gregor, 3, 73 

Mendelian Inheritance in Man, 24, 98, 106, 189 
Merriam, John, 42 
messenger RNA 
function, 23 

size of human genes, 61 
translation into protein, 23, 30 
methionine 
codon, 23 

mitochondrial genome, human origin clues from, 71 
molecular anthropology, 72 
molecular biology 

Big Science vs. small science, 125, 126*127 

of human devetopment, 65, 95 

manpower in, 10 

plant, genome mapping applications to, 73 
molecular evolution 
genome mapping and DNA sequencing applications to, 

57, 68-72 
humin origins, 71 
primate, 70 

unanswered questtons in, 69-70 
monoamine oxidase, 86 
Moskowitz, Jay, 94 n.l 

Mount Sinai Medical Center Institute of Human Genomic 

Studies, 100 
mouse 
beta gk)bin gene, 65 

ceU hybridizatkin, see somatic cell hybridization 
genetiC similarities to humans, 34, 67 
genetics database, 106 
haploid DNA content, 25 

Mua muBcuIua, amount sequenced and genome size, 
47 

muscular dystrophy 

Becker's, 63 

Duchenne, 57-59, 61-63 
mutations 

artificially induced, 25 

cancer from, 25 

chromosome structural changes involved in, 25-26 

deletion, 25, 31 

detection of, 42, 56, 58, 59 

duplication, 25 

in fruit flies, 42, 68 

Human Genetic Mutant CeU Repository, 31, 96, 190, 

192-193 
human rates, 72 
inversion, 26 
lethal, 42 

in nucleotide sequence, 23 
saturating screen technique for, 42 
in sex cells, 25 
in somatic cells, 25, 31 
transkx;ation, 25-26, 31 

National Academy of Sciences 
recommendations on genome projects, 107 
role in genome project oversi^t, 14, 15, 124 
views on international cooperation, 153 
see also National Research CouncU 
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National Institute on Mental Health, 95 
National Institutes of Health 
budgets for genome prpfecU, 8, 93 
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Pepper, Claude, 97 
Perkin-Elmer Cetus Instruments 

DNA sequencing technology, 45*46, 47 
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phenylalanine hydroxylase, 61 
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size determinants, 43, 81 
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strategies^ 43-44 
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expenditures, 147 

Medical Research Council, 42, 44, 147 

national genome research effort, 147-148 

nonhuman genome mapping, 8, 42, 147 

published genome researoh, 133, 195 
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Wada, Akiyoshi, 152, 156 
Walsh, James, 126 
Washington University 

collaboration with RIKEN, 157 

physical mapping of yeast genome, 41 
Watson, James D., 3, 21, 126, 152 
Weinberg, Robert, 126 
Wexler, Nancy, 135-136 
White, Raymond, 29, 126, 146 
Wilson, Allan, 126 
workshops 

on automation of DNA sequencing, 47 

on collaboration for genome projects, 187 

on costs of genome projects, 188 
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